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Abstract 

In this paper we present a variational technique that handles coarse-graining and passing to a limit 
in a unified manner. The technique is based on a duality structure, which is present in many gradient 
flows and other variational evolutions, and which often arises from a large-deviations principle. It has 
three main features: (A) a natural interaction between the duality structure and the coarse-graining, (B) 
application to systems with non-dissipative effects, and (C) application to coarse-graining of approximate 
solutions which solve the equation only to some error. As examples, we use this technique to solve three 
limit problems, the overdamped limit of the Vlasov-Fokker-Planck equation and the small-noise limit of 
randomly perturbed Hamiltonian systems with one and with many degrees of freedom. 
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1 Introduction 

Coarse-graining is the procedure of approximating a system by a simpler or lower-dimensional one, often in 
some limiting regime. It arises naturally in various fields such as thermodynamics, quantum mechanics, and 
molecular dynamics, just to name a few. Typically coarse-graining requires a separation of temporal and/or 
spatial scales, i.e. the presence of fast and slow variables. As the ratio of ‘fast’ to ‘slow’ increases, some form 
of averaging or homogenization should allow one to remove the fast scales, and obtain a limiting system that 
focuses on the slow ones. 

Coarse-graining limits are by nature singular limits, since information is lost in the coarse-graining 
procedure; therefore rigorous proofs of such limits are always non-trivial. Although the literature abounds 
with cases that have been treated successfully, and some fields can even be called well-developed—singular 
limits in ODEs and homogenization theory, to name just two—many more cases seem out of reach, such as 
coarse-graining in materials [dPC07], climate prediction [SATS07], and complex systems [FR07, NN12]. 

All proofs of singular limits hinge on using certain special structure of the equations; well-known ex¬ 
amples are compensated compactness [Tar79, Mur87], the theories of viscosity solutions [CIL92] and en¬ 
tropy solutions [Kru70, Smo94], and the methods of periodic unfolding [CDG02, CDG08] and two-scale 
convergence [A1192]. Variational-evolution structure, such as in the case of gradient flows and variational 
rate-independent systems, also facilitates limits [SS04, Ste08, MRS08, DSIO, Serif, MRS12, Miel4]. 

In this paper we introduce and study such a structure, which arises from the theory of large deviations for 
stochastic processes. In recent years we have discovered that many gradient flows, and also many ‘generalized’ 
gradient systems, can be matched one-to-one to the large-deviation characterization of some stochastic 
process [ADPZll, ADPZ13, DPZ14, DPZ13, DLZ12, MPR14]. The large-deviation rate functional, in this 
connection, can be seen to define the generalized gradient system. This connection has many philosophical 
and practical implications, which are discussed in the references above. 

We show how in such systems, described by a rate functional, ‘passing to a limit’ is facilitated by 
the duality structure that a rate function inherits from the large-deviation context, in a way that meshes 
particularly well with coarse-graining. 

1.1 Variational approach—an outline 

The systems that we consider in this paper are evolution equations in a space of measures. Typical exam¬ 
ples are the forward Kolmogorov equations associated with stochastic processes, but also various nonlinear 
equations, as in one of the examples below. 

Consider the family of evolution equations 


dtp^=M^P^, 

= pI, 


( 1 ) 


where is a linear or nonlinear operator. The unknown p® is a time-dependent Borel measure on a state 
space X, i.e. p'^ : [0, T] —)■ Xi{X). In the systems of this paper, (1) has a variational formulation characterized 
by a functional P such that 


7^ > 0 and p^ solves (1) I^{p^) = 0- (2) 

This variational formulation is closely related to the Brezis-Ekeland-Nayroles variational principle [BE76, 
Nay76, Ste08, Gho09] and the integrated energy-dissipation identity for gradient flows [AGS08]; see Section 5. 

Our interest in this paper is the limit e —>■ 0, and we wish to study the behaviour of the system in this 
limit. If we postpone the aspect of coarse-graining for the moment, this corresponds to studying the limit 
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of as e —)■ 0. Since p® is characterized by I®, establishing the limiting behaviour consists of answering two 
questions: 

1. Compactness: Do solutions of = 0 have useful compactness properties, allowing one to extract 

a subsequence that converges in a suitable topology, say c? 

2. Liminf inequality: Is there a limit functional / > 0 such that 


liminf/®(p®) > I{p)l 

e->0 


(3) 


And if so, does one have 


I{p) = 0 


p solves dtp = Afp, 


for some operator JV7 

A special aspect of the method of the present paper is that it also applies to approximate solutions. By this we 
mean that we are interested in sequences of time-dependent Borel measures p® such that supg^Q/“^(p®) < C 
for some C > 0. The exact solutions are special cases when (7 = 0. The main message of our approach is 
that all the results then follow from this uniform bound and assumptions on well-prepared initial data. 

The compactness question will be answered by the first crucial property of the functionals P, which is 
that they provide an a priori bound of the type 

S^ipt) + f R^ipl) ds < S%pl) + r{p% (4) 

Jo 

where pf denotes the time slice at time t and S'^ and are functionals. In the examples of this paper A® 
is a free energy and R‘^ a relative Fisher Information, but the structure is more general. This inequality 
is reminiscent of the energy-dissipation inequality in the gradient-flow setting. The uniform bound, by 
assumption, of the right-hand side of (4) implies that each term in the left-hand side of (4), i.e., the free 
energy at any time t > 0 and the integral of the Fisher information, is also bounded. This will be used to 
apply the Arzela-Ascoli theorem to obtain certain compactness and ‘local-equilibrium’ properties. All this 
discussion will be made clear in each example in this paper. 

The second crucial property of the functionals P is that they satisfy a duality relation of the type 

P{p) =s\xpj^{p,f), (5) 

/ 

where the supremum is taken over a class of smooth functions /. It is well known how such duality structures 
give rise to good convergence properties such as (3), but the focus in this paper is on how this duality structure 
combines well with coarse-graining. 

In this paper we define coarse-graining to be a shift to a reduced, lower dimensional description via a 
coarse-graining map ^ : X ^ y which identifies relevant information and is typically highly non-injective. 
Note that ^ may depend on e. A typical example of such a coarse-graining map is a ‘reaction coordinate’ in 
molecular dynamics. The coarse-grained equivalent of p‘^ : [0,T] —)• M{X) is the push-forward p^ := : 

[0,r] — >• M{y). If p® is the law of a stochastic process X^, then ^^p^ is the law of the process ^(X^). 

There might be several reasons to be interested in ^^p‘^ rather than p'^ itself. The push-forward 
obeys a dynamics with fewer degrees of freedom, since ^ is non-injective; this might allow for more efh- 
cient computation. Our first example (see Section 1.3), the overdamped limit in the Vlasov-Fokker-Planck 
equation, is an example of this. As a second reason, by removing certain degrees of freedom, some specific 
behaviour of might become clearer; this is the case with our second and third examples (Section 1.3), 
where the effect of ^ is to remove a rapid oscillation, leaving behind a slower diffusive movement. Whatever 
the reason, in this paper we assume that some ^ is given, and that we wish to study the limit of s-s 
e —> 0. 
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The core of the arguments of this paper, that leads to the characterization of the equation satisfied by 
the limit of is captured by the following formal calculation: 

sup J%p'^,f) 
f 

sup J%p^,go^) 

9 

sup Jip,go^) 

9 

sup J{p,g) ^*=: i{p) 
a 

Let us go through the lines one by one. The first line is the duality characterization (5) of P. The 
inequality in the second line is due to the reduction to a subset of special functions /, namely those of the 
form f = g o This is in fact an implementation of coarse-graining: in the supremum we decide to limit 
ourselves to observables of the form g o ^ which only have access to the information provided by After 
this reduction we pass to the limit and show that J^{p'^,g o converges to some J{p,g o C )—at least for 
appropriately chosen coarse-graining maps. 

In the final step (*) one requires that the loss-of-information in passing from p to p is consistent with the 
loss-of-resolution in considering only functions f = go This step requires a proof of local equilibrium, which 
describes how the behaviour of p that is not represented explicitly by the push-forward p, can nonetheless 
be deduced from p. This local-equilibrium property is at the core of various coarse-graining methods and is 
typically determined case by case. 

We finally define I by duality in terms of J as in (**). In a successful application of this method, 
the resulting functional I at the end has ‘good’ properties despite the loss-of-accuracy introduced by the 
restriction to functions of the form go^, and this fact acts as a test of success. Such good properties should 
include, for instance, the property that 1 = 0 has a unique solution in an appropriate sense. 

Now let us explain the origin of the functionals P. 

1.2 Origin of the functional P: large deviations of a stochastic particle system 

The abstract methodology that we described above arises naturally in the context of large deviations, and we 
now describe this in the context of the three examples that we discuss in the next section. All three originate 
from (slight modifications of) one stochastic process, that models a collection of interacting particles with 
inertia in the physical space M”*: 

P^(t) 

dQ'lit) = -^^dt, ( 6 a) 

m 

1 " 

dPf^{t) = -VF(Q”(t))dt - - y VV’(Q”(t) - Q^{t))dt - ^Pf^{t)dt + y^dW,{t). ( 6 b) 

Here Q" G and P" G are the position and momentum of particles i = I,...,n with mass to. 

Equation ( 6 a) is the usual relation between Qf and P", and ( 6 b) is a force balance which describes the 
forces acting on the particle. For this system, corresponding to the first example below, these forces are (a) a 
force arising from a fixed potential V, (b) an interaction force deriving from a potential tp, (c) a friction force, 
and (d) a stochastic force characterized by independent d-dimensional Wiener measures Wi. Throughout 
this paper we collect Q" and P" into a single variable Xf = {Qf, P"). 

The parameter 7 characterizes the intensity of collisions of the particle with the solvent; it is present in 
both the friction term and the noise term, since they both arise from these collisions (and in accordance with 
the Einstein relation). The parameter 9 = kTa, where k is the Boltzmann constant and Ta is the absolute 


/Ac/ 

P{P) = 


f=g°i 

> 
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temperature, measures the mean kinetic energy of the solvent molecules, and therefore characterizes the 
magnitude of collision noise. Typical applications of this system are for instance as a simplified model 
for chemical reactions, or as a model for particles interacting through Coulomb, gravitational, or volume- 
exclusion forces. However, our focus in this paper is on methodology, not on technicality, so we will assume 
that Ip is sufficiently smooth later on. 

We now consider the many-particle limit n —)■ oo in ( 6 ). It is a well-known fact that the empirical measure 

n 

converges almost surely to the unique solution of the Vlasov-Fokker-Planck (VFP) equation [Oel84] 

dtp = i^p)*P, i^p)*P ■= - diVq (p^) -b diVp p(VqV + Vq%p * pL + 7 ^^ -b 76 * ApP, 

p 

= — div pJV (H + Ip * p,) + j diVp p-b jOApp, 

m 

with an initial datum that derives from the initial distribution of Xf. The spatial domain here is with 
coordinates {q,p) G x and subscripts such as in and Ap indicate that differential operators act 
only on corresponding variables. The convolution is defined by (pp * p){q) = J^ 2 d''P{q — q')pW,p')dq'dp'. 
In the second line above we use a slightly shorter way of writing by introducing the Hamiltonian 
H{qjP) = p^/‘2,m-\-V{q) and the canonical symplectic matrix J = (q). This way of writing also highlights 
that the system is a combination of conservative effects, described by J, H, and xp, and dissipative effects, 
which are parametrized by 7 . The primal form of the operator is 

= JV{H + V; * p) • V/ - 7 ^ • Vp/ -b 70Ap/. 

m 

The almost-sure convergence of p„ to the solution p of the (deterministic) VFP equation is the starting 
point for a large-deviation result. In particular it has been shown that the sequence (p„) has a large-deviation 
property [DG87, BDFI2, DPZ13] which characterizes the probability of finding the empirical measure far 
from the limit p, written informally as 

Prob(p„ « p) ~ exp(^ - ^d(p)), 

in terms of a rate functional I : (^([O, T]; —)• K. If we assume that the initial data Xf are chosen to 

be deterministic, and such that the initial empirical measure Pn(0) converges narrowly to some po, then / 
has the form [DPZ13] 

T T 

fodpo- J J {dtf-\-^pj) dptdt - ^ J J A{f,f)dptdt, ( 10 ) 

0 RSti 0 R2£i 

provided pt|t=o = Po, where A is the carre-du-champ operator (e.g. [BGL+14, Section 1.4.2]) 

A(/,P) := - 9^nf) = 7^ Vp/Vpp. 

If the initial measure pt\t=o is not equal to the limit po of the stochastic initial empirical measures, then 
I{p) = 00 . 

Note that the functional I in (10) is non-negative, since / = 0 is admissible. If /(p) = 0, then by replacing 
/ by A/ and letting A tend to zero we find that p is the weak solution of ( 8 ) (which is unique, given initial 
data Po [Fun84]). Therefore I is of the form that we discussed in Section 1.1: I > 0, and /(p) = 0 iff p 
solves ( 8 ), which is a realization of ( 1 ). 


lip) := 


fee, 


sup 

1.2/„xR2£i) 


Jt dpT — 


R2ti 


R 2 ti 


( 8 ) 

(9) 
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1.3 Concrete Problems 


We now apply the coarse-graining method of Section 1.1 to three limits: the overdamped limit 7 —)■ oo, and 
two small-noise limits 0 —)■ 0. In each of these three limits, the VFP equation ( 8 ) is the starting point, and we 
prove convergence to a limiting system using appropriate coarse-graining maps. Note that the convergence 
is therefore from one deterministic equation to another one; but the method makes use of the large-deviation 
structure that the VFP equation has inherited from its stochastic origin. 

1.3.1 Overdamped limit of the Vlasov-Fokker-Planck equation 

The first limit that we consider is the limit of large friction, 7 —)■ 00 , in the Vlasov-Fokker-Planck equation 
( 8 ), setting 0 = 1 for convenience. To motivate what follows, we divide ( 8 ) throughout by 7 and formally let 
7 —>■ 00 to find 

divp'^(^) ^ 

which suggests that in the limit 7 —>■ 00 , p should be Maxwellian in p, i.e. 

2 

pt{dq, dp) = Z~^ ®^p(“2k) 

where Z = ( 2 m 7 r)‘^/^ is the normalization constant for the Maxwellian distribution. The main result in 
Section 2 shows that after an appropriate time rescaling, in the limit 7 —>■ 00 , the remaining unknown 
cr G (^([O, T]; P(IR‘^)) solves the Vlasov-Fokker-Planck equation 

dtu = div(CTW (q)) div((T(V^/> * cr)) -I- Act. (12) 

In his seminal work [Kra40], Kramers formally discussed these results for the ‘Kramers equation’, which 
corresponds to ( 8 ) with rjj = 0, and this limit has become known as the Smoluchowski-Kramers approximation. 
Nelson made these ideas rigorous [Nel67] by studying the corresponding stochastic differential equations 
(SDEs); he showed that under suitable rescaling the solution to the Langevin equation converges almost 
surely to the solution of (12) with ip = 0. Since then various generalizations and related results have been 
proved [Fre04, CF06, Nar94, HVW12], mostly using stochastic and asymptotic techniques. 

In this article we recover some of the results mentioned above for the VFP equation using the variational 
technique described in Section 1.1. Our proof is made up of the following three steps. Theorem 2.4 provides 
the necessary compactness properties to pass to the limit. Lemma 2.5 gives the characterization (11) of the 
limit, and in Theorem 2.6 we prove the convergence of the solution of the VFP equation to the solution 
of ( 12 ). 

1.3.2 Small-noise limit of a randomly perturbed Hamiltonian system with one degree of 
freedom 

In our second example we consider the following equation 

dtp = - divq (p—) + diVp(pVqV) -I- eApp on [0, T] x (13) 

where {q,p) G t G [0,T] and div^, divp, Ap are one-dimensional derivatives. This equation can also be 
written as 


dtp = — div(pJViL) -I- eApp, on [0, T] x 


(14) 


This corresponds to the VFP equation ( 8 ) with ip = 0, without friction and with small noise e = yd. 
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(a) £ = 0.005 


(b) £ = 0.00005 


Figure 1: Simulation of (15) for varying £. Shown are the level curves of the Hamiltonian H and for each 
case a single trajectory. 


In addition to the interpretation as the many-particle limit of (6), Equation (14) also is the forward 
Kolmogorov equation of a randomly perturbed Hamiltonian system in with Hamiltonian H : 

dXt = JVH{Xt) + dWt, (15) 

where Wt is now a 1-dimensional Wiener process. When the amplitude e of the noise is small, the dynamics 
(14) splits into fast and slow components. The fast component approximately follows an unperturbed 
trajectory of the Hamiltonian system, which is a level set of H. The slow component is visible as a slow 
modification of the value of H, corresponding to a motion transverse to the level sets of H. Figure 1 illustrates 
this. 

Following [FW94] and others, in order to focus on the slow, Hamiltonian-changing motion, we rescale 
time such that the Hamiltonian, level-set-following motion is fast, of rate 0{l/e), and the level-set-changing 
motion is of rate 0(1). In other words, the process (15) ‘whizzes round’ level sets of H at rate 0(l/£), while 
shifting from one level set to another at rate 0(1). 

This behaviour suggests choosing a coarse-graining map ^ —>■ F, which maps a whole level set to a 

single point in a new space F; because of the structure of level sets of H, the set F has a structure that is 
called a graph, a union of one-dimensional intervals locally parametrized by the value of the Hamiltonian. 
Figure 2 illustrates this, and in Section 3 we discuss it in full detail. 

After projecting onto the graph F, the process turns out to behave like a diffusion process on F. This 
property was first made rigorous in [FW94] for a system with one degree of freedom, as here, and non¬ 
degenerate noise, using probabilistic techniques. In [FW98] the authors consider the case of degenerate noise 
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by using probabilistic and analytic techniques based on hypoelliptic operators. More recently this problem 
has been handled using PDE techniques [IS12] (the elliptic case) and Dirichlet forms [BvR14]. In Section 3 
we give a new proof, using the structure outlined in Section 1.1. 

1.3.3 Small-noise limit of a randomly perturbed Hamiltonian system with d degrees of free¬ 
dom 

The convergence of solutions of (14) as e —)■ 0 to a diffusion process on a graph requires that the non-perturbed 
system has a unique invariant measure on each connected component of a level set. While this is true for a 
Hamiltonian system with one degree of freedom, in the higher-dimensional case one might have additional 
first integrals of motion. In such a system the slow component will not be a one-dimensional process but a 
more complicated object—see [FW04]. However, by introducing an additional stochastic perturbation that 
destroys all first integrals except the Hamiltonian, one can regain the necessary ergodicity, such that the 
slow dynamics again lives on a graph. 

In Section 4 we discuss this case. Equation (14) gains an additional noise term, and reads 

dtp = — div(p JVi?) -I- /tdiv(aV/9) -I- eApp, (16) 

where a : —>■ M 2 (ix 2 d aVi? = 0, dim(Kernel(a)) = 1, and K,e > 0 with e. The spatial domain 

is d > 1 with coordinates (g,p) € x and the unknown is a trajectory in the space of probability 
measures p : [0,T] —>■ P(IR^'^). As before the aim is to derive the dynamics as £ —>■ 0. This problem was 
studied in [FWOl] and the results closely mirror the previous case. The main difference lies in the proof of 
the local equilibrium statement, which we discuss in Section 4. 

1.4 Comparison with other work 

The novelty of the present paper lies in the following. 

1. In comparison with existing literature on the three eoncrete examples treated in this paper: The results 
of the three examples are known in the literature (see for instance [Nel67, FW94, FW98, FWOl]), 
but they are proved by different techniques and in a different setting. The variational approach of 
this paper, which has a clear microscopic interpretation from the large-deviation principle, to these 
problems is new. We provide alternative proofs, recovering known results, in a unified framework. In 
addition, we obtain all the results on compactness, local-equilibrium properties and liminf inequalities 
solely from the variational structures. The approach also is applicable to approximate solutions, which 
obey the original Hne-grained dynamics only to some error. This allows us to work with larger class of 
measures and to relax many regularity conditions required by the exact solutions. Furthermore, our 
abstract setting has potential applications to many other systems. 

2. In eomparison with recently developed variational-evolutionary methods: Many recently developed vari¬ 
ational techniques for ‘passing to a limit’ such as the Sandier-Saferty method based on the 
structure [SS04, AMP+12, Miel4] only apply to gradient flows, i.e. dissipative systems. The approach 
of this paper also applies to certain variational-evolutionary systems that include non-dissipative ef¬ 
fects, such as GENERIC systems [Ott05, DPZ13]; our examples illustrate this. Since our approach 
only uses the duality structure of the rate functionals, which holds true for more general systems, 
this method also works for other limits in non-gradient-flow systems such as the Langevin limit of the 
Nose-Hoover-Langevin thermostat [FGll, OPll, Shal7]. 

3. Quantification of the eoarse-graining error. The use of the rate functional as a central ingredient in 
‘passing to a limit’ and coarse-graining also allows us to obtain quantitative estimates of the coarse- 
graining error. One intermediate result of our analysis is a functional inequality similar to the energy- 
dissipation inequality in the gradient-flow setting (see (4)). This inequality provides an upper bound on 
the free energy and the integral of the Fisher information by the rate functional and initial free energy. 



To keep the paper to a reasonable length, we address this issue in details separately in a companion 
article [DLP+]. 

We provide further comments in Section 5. 

1.5 Outline of the article 

The rest of the paper is devoted to the study of three concrete problems: the overdamped limit of the VFP 
equation in Section 2, diffusion on a graph with one degree of freedom in Section 3, and diffusion on a graph 
with many degrees of freedom in Section 4. In each section, the main steps in the abstract framework are 
performed in detail. Section 5 provides further discussion. Finally, detailed proofs of some theorems are 
given in Appendices A and B. 

1.6 Summary of notation 


±1, depending on which end vertex Oj lies of edge Ik Sec. 3.1 

F Free energy (22), (46) 

7 (Sec. 2) large-friction parameter 

F ,7 (Sec. 3) The graph F and its elements 7 Sec. 3.1 

%{■[) relative entropy ( 21 ) 

H{q,p) H{q,p) = p"^/2m + V{q), the Hamiltonian 
n-dimensional Hausdorff measure 

I(-|-) relative Fisher Information (24) 

Int The interior of a set 

P Large-deviation rate functional for the diffusion-on-graph problem (48) 

P Large-deviation rate functional for the VFP equation (19) 

J J ={ -j 0 ), the canonical symplectic matrix 

£ Lebesgue measure 

primal and dual generators Sec. 1.2 

A4(T) space of finite, non-negative Borel measures on X 
V{X) space of probability measures on X 

p push-forward under ^ of p (45) 

T( 7 ) period of the periodic orbit at 7 G F (51) 

V(q) potential on position (‘on-site’) 

X X = {q,p) joint variable 

coarse-graining maps (30), (44) 


Throughout we use measure notation and terminology. For a given topological space X, the space Ai{X) 
is the space of non-negative, finite Borel measures on X; V{X) is the space of probability measures on X. For 
a measure p G A1([0, T] x K^^*), for instance, we often write pt G A4(M^‘’*) for the time slice at time t; we also 
often use both the notation p{x)dx and p{dx) when p is Lebesgue-absolutely-continuous. We equip M{X) 
and V{X) with the narrow topology, in which convergence is characterized by duality with continuous and 
bounded functions on X. 


2 Overdamped Limit of the VFP equation 


2.1 Setup of the system 

In this section we prove the large-friction limit 7 —)■ 00 of the VFP equation ( 8 ). 
and speeding time up by a factor 7 , the VFP equation reads 

dtp = £f*p, £f/p := -7 divpJV(il + ip * v) + 


div„ 


Setting 6 = 1 for convenience, 



(17) 
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where, as before, J = (_?/ o) H{q,p) = I2m + V{q). The spatial domain is with coordinates 
{q,p) € X with d > 1, and p G C([0, T]; For later reference we also mention the primal form 

of the operator : 

= iJV{H + ^ * j.) . V/ - 7^^ • Vp/ + 72Ap/. (18) 

We assume 

(VI) The potential V € has globally bounded second derivative. Furthermore V > 0, |Wp < 

(7(1 + V) for some C > 0, and e~^ G L^(IR‘^). 

(V2) The interaction potential ip G n 1F^’^(K'^) is symmetric, has globally bounded first and second 

derivatives, and the mapping v i-G J ly * ip dv is convex (or equivalently non-negative). 


As we described in Section 1.1, the study of the limit 7 —>■ oo contains the following steps: 

1. Prove compactness; 

2. Prove a local-equilibrium property; 

3. Prove a liminf inequality. 

According to the framework detailed by (1), (2), each of these results is based on the large-deviation structure, 
which for Equation (17) is associated to the functional H : (7([0, T]; P)®^*^)) -G M with 


r{p) = sup 

/eCj'^(RxR2^) 


1 ^ 1 

frdpT- j fodpo- j j (dtft+^pJt^dptdi-Y J J dptdt 


, (19) 


R2d R2d 0 M2d 0 

where is given in (18). Alternatively the rate functional can be written as [DPZ13, Theorem 2.5] 

T 

r 

|2 o „ _ cx* „ „ h \ u r- t"2( 


1 


P{p) = { 2 


\ht\ dptdt a dtpt = ^p^Pt - ldivp{ptht), ior h G P{0,T; L^{p)), and p|t=o = Po 


0 K 2 d 
,- 1-00 


otherwise. 


( 20 ) 


where Af* is given in (17). For fixed t, the space Ly{pt) is the closure of the set {Vpp : p G (7“(M^‘^)} 
in P{pt), the pt-weighted L^-space. Similarly, P{0,T] Ly{p)) is defined as the closure of {Vptp : p G 
(7“((0,T) X in the L^-space associated to the space-time density p. This second form of the rate 

functional shows clearly how P{p) = 0 is equivalent to the property that p solves the VFP equation (17). It 
also shows that if P{p) > 0, then p is an approximate solution in the sense that it satisfies the VFP equation 
up to some error — 7 divp(pt/i() whose norm is controlled by the rate functional. 


2.2 A priori bounds 

We give ourselves a sequence, indexed by 7 , of solutions p'^ to the VFP equation (17) with initial datum 
p7lt=o = Po- We will deduce the compactness of the sequence p'^ from a priori estimates, that are themselves 
derived from the rate function P. 

For probability measures v, ( on we first introduce: 


• Relative entropy: 


H(HC) = 


Y [/log/]dC if ^ = /C, 

jR2<i 


00 


otherwise. 


( 21 ) 
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• The free energy for this system: 

^{ 1 /) dx)+ ^ f ip * v dv = f \ogg + H + ^-ip * g gdx + logZn, (22) 

2 J^2d J-g2d L 2 J 

where Zh = J e~^ and the second expression makes sense whenever v = gdx. 

The convexity of the term involving ip (condition (V2)) implies that the free energy T is strictly convex and 
has a unique minimizer g, € This minimizer is a stationary point of the evolution (17), and has the 

implicit characterization 

^e-p(K 2 d). ^(^dqdp) = Z-^ ejq)(^-[H{q,p) + {ip * g){q)]^ dqdp, (23) 

where Z is the normalization constant for g. Note that Vp/i = —gVpH = —pg/m. 

We also define the relative Fisher Information with respect to g (in the p-variable only): 

Xlvlg) = sup 2 f App — — Vp (/9 — dv. (24) 

p6C“(R2‘i) Jr^‘1 1 To 2 J 

Note that the right hand side of (24) depends on g via Vp(logp) = —'\/pH{q,p) = —p/m. In the more 
common case in which the derivatives Ap and Vp are replaced by the full derivatives A and V, the relative 
Fisher Information has an equivalent formulation in terms of the Lebesgue density of v. In our case such 
equivalence only holds when v is absolutely continuous with respect to the Lebesgue measure in both q 
and p: 

Lemma 2.1 (Equivalence of relative-Fisher-Information expressions for a.c. measures). If v G 7^(K^‘^), 
v{dx) = f{x)dx with f € L^(IR^'^), then 


Xiv\g) = 


/ ^l{/> 0 } + - fdqdp, 

Vd f ^ m 


if Vp/e Lj(,Jdgdp), 
otherwise, 


where l{/>o} denotes the indieator function of the set {x G | f{x) > 0 } and Vpf is the distributional 
gradient of f in the p-variable only. 

For a measure of the form (/{dq)f{p)dp, with / dq, the functional X in (24) may be finite while the 
integral in (25) is not defined. Because of the central role of duality in this paper, definition (24) is a natural 
one, as we shall see below. The proof of Lemma 2.1 is given in Appendix A. 

In the introduction we mentioned that we expect to become Maxwellian in the limit 7 —>■ 00 . This 
will be driven by a vanishing relative Fisher Information, as we shall see below. For absolutely continuous 
measures, the characterization (25) already provides the property 

2 

X{fdx\g)=0 =7 /(g,p) =/(g)exp(-|^y 

This property holds more generally: 

Lemma 2.2 (Zero relative Fisher Information implies Maxwellian). If v G with X{iy\g) = 0, then 

there exists a € such that 


u{dqdp) = Z ^ exp 



a{dq)dp, 


where Z = 


^dp is the normalization constant for the Maxwellian distribution. 


II 



Proof. From 


I{v\n) = sup 2 f [ Ap(p — — • VpV? — = 0 (26) 

7 K 2 d \ m 2 J 

we conclude upon disintegrating v as v(dqdp) = a(dq)vq{dp), 

for cr-a.e. q: sup f (Ap(j)— — ■ Vp(/) — ^|Vp())p ) Vq{dp) = 0. 

<^eC“(K‘*) dB-i \ m 2 J 

By replacing <j) by Xf, A > 0, and taking A —)■ 0 we find 

[ (Ap<f-^-\/p<f)jyq{dp)=0, 

J^d \ m J 

which is the weak form of an elliptic equation on with unique solution (see e.g. [BKRS15, Theorem 

4.1.11]) 

This proves the lemma. □ 


In the following theorem we give the central a priori estimate, in which free energy and relative Fisher 
Information are bounded from above by the rate functional and the relative entropy at initial time. 

Theorem 2.3 {A priori bounds). Fix 7 > 0 and let p G (^([O, Tj; 7^(IR^'^)) with pt\t=o ='■ Po satisfy 

P{p) < 00 , iF{po) < 00 . (27) 

Then for any t G [0, T] we have 

HPt) + Y I ^ (28) 

From (28) we obtain the separate inequality 

V / H dpt < Hpo) + P (p) + log ■ (29) 

This estimate will lead to a priori bounds in two ways. First, the bound (29) gives tightness estimates, and 
therefore compactness in space and time (Theorem 2.4); secondly, by (28), the relative Fisher Information 
is bounded by C/j^ and therefore vanishes in the limit 7 —)■ 00 . This fact is used to prove that the limiting 
measure is Maxwellian (Lemma 2.5). 

Proof We give a heuristic motivation here; Appendix B contains a full proof. Given a trajectory p as in the 
theorem, note that by ( 20 ) p satisfies 

dtPt = - J diY ptJV {H + fj * pt) + 7 ^(^divppt£- + ApPt^ - ydivppt/i*, with h G L^(0, T; Lv(p))- 
We then formally calculate 


dt 


^(Pt) = / [logpt + 1 + H + ij;* Pt] (-7 div ptJV(i7 + ■;/' * Pt) + 7 ^(divpPt— + AppA - jdivppthi 

jg2d \ m 


= -7 


/ - 

/R2d Pt 

<p[ 7 

2 jR2d Pt 


'^ppt + Pt — 


VpPt + Pt — 
m 


+ 7 
2 1 
+ 2 




! (^pPt 


Pt- 


'B 2 ti 


Pth 
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where the first 0 ( 7 ) term cancels because of the anti-symmetry of J. After integration in time this latter 
expression yields (28). 

For exact solutions of the VFP equation, i.e. when I^{p) = 0, this argument can be made rigorous 
following e.g. [BCS97]. However, the fairly low regularity of the right-hand side in (20) prevents these 
techniques from working. ‘Mild’ solutions, defined using the variation-of-constants formula and the Green 
function for the hypoelliptic operator, are not well-defined either, for the same reason: the term JJ VpG-hdp 
that appears in such an expression is generally not integrable. In the appendix we give a different proof, 
using the method of dual equations. 

Equation (29) follows by substituting 


= n (pt 








Hdpt 


4’* Pt dpt + log 


-H 


in (28), where Zh /2 ■= f,^ 2 d e 


□ 


2.3 Coarse-graining and compactness 

As we described in the introduction, in the overdamped limit 7 —>■ 00 we expect that p will resemble a 
Maxwellian distribution Z~^ exp(—p^/2m)cTt(dg), and that the g-dependent part cr will solve equation (12). 
We will prove this statement using the method described in Section 1.1. 

It would be natural to define ‘coarse-graining’ in this context as the projection ^{q,p) '■= q, since that 
should eliminate the fast dynamics of p and focus on the slower dynamics of q. However, this choice fails: it 
completely decouples the dynamics of q from that of p, thereby preventing the noise in p from transferring 
to q. Following the lead of Kramers [Kra40], therefore, we define a slightly different coarse-graining map 

^7 . ^ (30) 

7 

In the limit 7 —>■ 00 , 5'’' —^ locally uniformly, recovering the projection onto the g-coordinate. 

The theorem below gives the compactness properties of the solutions p^ of the rescaled VFP equation 
that allow us to pass to the limit. There are two levels of compactness, a weaker one in the original space 
and a stronger one in the coarse-grained space = 4/^ (K^"^) • This is similar to other multilevel compactness 
results as in e.g. [GOVW09]. 

Theorem 2.4 (Gompactness). Let a sequence p'^ G C([0, T]; 7^(1^^“*)) satisfy for a suitable constant C > 0 
and every 7 the estimate 


P{p^)+Hp'l\t=^)<C. (31) 

Then there exist a subsequence (not relabeled) such that 

1. p'^ —>■ p in A4([0,T] x K^”*) with respect to the narrow topology. 

P” ^# 7 *^ ^ttP C'dOj^];^(®‘^)) respect to the uniform topology in time and narrow topology on 
For a.e. t G [0,T] the limit pt satisfies 


l{pt\p)=0 (32) 

Proof To prove part 1, note that the positivity of the convolution integral involving xf and the free-energy- 
dissipation inequality (28) imply that 'H{p/\Zfj^e~^dx) is bounded uniformly in t and 7 . By an argument 
as in [ASZ09, Prop. 4.2] this implies that the set of space-time measures {p'^ : 7 > 1} is tight, from which 
compactness in A4([0,T] x K^'^) follows. 
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To prove (32) we remark that 

0 < sup 2 / / Ap(/9 - — dp]dt< [ I{p]\p,) dt < ^ 0, 

<peC“(MxK 2 <i) Jq J^ 2 d I m I 1 Jq 7 

and by passing to the limit on the left-hand side we find 

n l 

/^pP -—\/pP --\\Ipp\^ dptdt = 0. 

, 2 dL m Z J 

By disintegrating p in time as p{dtdqdp) = pt{dqdp)dt, we find that 2{pt\p) = 0 for (Lebesgue-) almost all t. 

We prove part 2 with the Arzela-Ascoli theorem. For any t G [0,r] the sequence ^^Pt is tight, which 
follows from the tightness of proved above and the local uniform convergence ^ ^ (see e.g. [AGS08, 
Lemma 5.2.1]). 

To prove equicontinuity we will show that 

sup sup sup f piQp'J+i, - Qp]) 0. (33) 

7>1 te[0,T-ft,] (p 6 C^(K‘*) 

ll‘^llc 2 (IR<i) —r 

In fact, (33) is a direct consequence of the following stronger statement 

/ vi^lpl+h - Qpt) < CWWpWooVh (34) 

Jr'^ 

with C independent of t ,7 and p. Note that (34) in particular implies a uniform l/2-H61der estimate with 
respect to the L^-Wasserstein distance. 

Let us now give the proof of (34). Indeed, the boundedness of the rate functional, definition (20), and 
tightness of p'* imply that there exists some W G L‘^{0,T] Ly{p])) with 

dtp] = {d^pjTpl; - 1 diYpip'lh'l). (35) 

in duality with C'j(M^'^), pointwise almost everywhere in t G [0,T]. Therefore for any / G C'j(M^'^) we have 
in the sense of distributions on [0,T], 

[ fPt= [ (l- ■ • Vpf - jVpf ■ (V^V' * p^) - 7^^ • Vpf + 7 "Ap/ + -fVpf ■ h7)) dpj. 

To prove (34), make the choice / = p o for p G C'^(IR'^) and integrate over [t, Note that due to the 

specific form of = g -l-p /7 the terms 7 ^ • V^/ and 7 ^^ • Vp/ cancel and therefore 


p{i#pl+h - ^#Pt) = • vp -b 


+ Ap -bVp ■ h2{q,p)jdp2 ds. 

We estimate the first term on the right hand side by using Holder’s inequality and growth condition (VI), 


f [ VV{q)-Vp(q+^)dp]ds <\\Vp\\^Vh{ f [ \VViq)\^dp:ds] 

Jt JR^-i \ ij \Jt JE2<i J 

<\\Vp\\ooVh( f (7(1 + V(g))p2 rfs) <G|lVp||ooV^, 

\Jt 7E2<i J 
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where the last inequality follows from the free-energy-dissipation inequality (28). For the second term we 
use iVg'i/' * pII ^ l!Vg'0||oo and the last term is estimated by Holder’s inequality, 






h2{q,p)dp2ds 





\h2\^dp]ds 


1 

2 


< W^pWooVh {2P{p^)Y < c|iV(^||ooV^. 


To sum up we have 


JR'* 


<C\\Vp\\^y/h^0, 


where C is independent of t, 7 and ip. 

Thus by the Arzela-Ascoli theorem there exists a ly G (^([O,T];7^(IR'^)) such that Q^P^ —t with respect 
to uniform topology in time and narrow topology on Since p'^ —>■ p in A^([0,T] x and ^ 

locally uniformly, we have —t ^#p in Ad([0,r] x R‘^) (again using [AGS08, Lemma 5.2.1]), implying 

that ly = ^p,p. This concludes the proof of Theorem 2.4. □ 


2.4 Local equilibrium 

A central step in any coarse-graining method is the treatment of the information that is ‘lost’ upon coarse- 
graining. The lemma below uses the a priori estimate (28) to reconstruct this information, which for this 
system means showing that p^ becomes Maxwellian in p as 7 —)■ 00 . 

Lemma 2.5 (Local equilibrium). Under the assumptions of Theorem 2.f, let p"^ ^ p in A4([0,r] x M^^*) 
with respect to the narrow topology and Q^P^ —t in <^([0, T]; T^)®”*)) with respeet to the uniform topology 

in time and narrow topology on 7^(IR'^). Then there exists a G (^([O,T];7^(1^“*)), a{dtdq) = at{dq)dt, such 
that for almost all t G [0, T], 


pt{dqdp) = Z ^ exp 



at{dq)dp, 


(36) 


where Z = e ^ '^^"^dp is the normalization constant for the Maxwellian distribution. Furthermore —t 

cr uniformly in time and narrowly on 7^(IR‘^). 

Proof. Since p^ ^ p narrowly in A1([0, T] x the limit p also has the disintegration structure p{dtdpdq) = 
pt{dpdq)dt, with pt G From the a priori estimate (28) and the duality definition of I we have 

I{pt\p) = 0 for almost all t, and the characterization (36) then follows from Lemma 2.2. The uniform 
in time convergence of C^p'^ implies f^p'^ = cr uniformly in time and narrowly on 7^(IR‘^) and the 

regularity cr e C'([0,r];iP(M^)). □ 


2.5 Liminf inequality 

The final step in the variational technique is proving an appropriate liminf inequality which also provides 
the structure of the limiting coarse-grained evolution. The following theorem makes this step rigorous. 
Define the (limiting) functional / : C([0, T]; 7^(M‘^)) —)■ M by 


/(cr) := sup / grdar- godao - / (dtg-W-Vg-(y'ijj*a)-Vg + Ag]datdt 


/o Jr^ 


\Wg\^datdt. (37) 
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Note that / > 0 (since g = 0 is admissible); we have the equivalence 

l{a)=0 9t(T = div(jVF(g) + divCT(V^/’* (t) + Act in [0, T] x 

Theorem 2.6 (Liminf inequality). Under the same conditions as in Theorem 2.4 we assume that ^ p 
narrowly in x M^'^) and ^^p'^ ^ ^^p = a in C{[0,T]]V{W^)). Then 

liminf/'’'(p'’') > I (a). 

7—>-oo 

Proof. Write the large deviation rate functional /'*' : (^([O,T];7^(IR^‘^)) —)• M in (19) as 

r{p)= sup J^pJ), (38) 

/e 7 '^(RxE 2 <i) 

where 

J^ip, f )= [ frdpT - f fodpo -II (dtf + 7 - • VJ - ■ Vpf - qV^/ • (V^V' * Pt) 

jRSd J^2d Jq J^ 2d \ m 

-j'^—'S/pf + j'^ApAdpidt-^l I \S/pf\'^ dptdt. 

m J ^ Jo jRSt* 

Define .4 := {/ = p o with g € (^^’^(IR x M'^)}. Then we have 

/^(p^)>sup J'<{p\f), 

f(SA 

and 

J^{p\9oC)= [ PToCdpl- I gooCdp^o- r I k(5or)-V,F(g)-Vpfg+^) 

JR 2 d jR2d Jq J^2d I V 7/ 

+ ^^9(9+1^^ ■{'^q'fp*p7){q) dpfdt-^J^ J^^jS/{go^'-f)fdpfdt. (39) 

Note how the specific dependence of C^{q,p) = q + pH 7 has caused the coefficients 7 and 7 ^ in the 
expression above to vanish. Adding and subtracting Vl^ iq+p/j)-'^9iq+p/^) in (39) and defining p := f.'^p'^, 
can be rewritten as 

J'^{P,9 °C)=[ 9Tdpl- I godp^- I I {dtg-\^V ■\/g + Ag){()p{d()dt-}- I I pgfdpdt 

Jr<^ JR'* Jo JR-* ^ Jo JR'* 

~ Jo (w( 9 +^) -V^(-z)) ■^g(^q+^ypfdt +£ I^^^S/g(^q+^yi\/,f;*p])iq)dpfdt. 

0 R- 7 7 0 R- 7 

We now show that (40) converges to the right-hand side of (37), term by term. Since f^p'^ —>■ C#p = cr 
narrowly in A4([0,T] x M^'^) and g G Ch^(IR x M*^) we have 

pT p 1 r'^ r T 

/ / (dtg-VV-Wg + Ag+-\Vg\Adpdtdl^ / / (dtg-VV-Vg + Ag +-\Vg\AdaPt. 

Jo JR'* ^ J Jo JR'* ^ I / 

Taylor expansion of Vl^ around q and estimate (29) give 

Jo Ld ^ 7 )" ^ 

< \\D^V\\oo\\^g\\ooVf ( r I ^dpfdt] < - dlAJp 0 . 

\Jo Jr^-^ 7/7 
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Adding and subtracting g{q) ■ (Vqf/' * pl){q) in (40) we find 


[ [ '^ 9 (q+-\ p7){q)dp]dt= [ [ S/g{q) ■ {S/gij* p]){q)dp]dt 

Jo Jr^^ V 7/ Jo 7 e 2 <^ 


lo 7]R2d 


^g{q+-] - ^giq) 

7 


{Vq-Ip* p'J){q)dp]dt. 


Since p'^ ^ p we have p'^ ® p'^ —>■ p ® p and therefore passing to the limit in the first term and using the 
local-equilibrium characterization of Lemma 2.5, we obtain 

f [ '^9{q) ■ * P^){q)dp'ldt f ( \/g ■ {\/ip*a)datdt. 

Jo Jr^‘‘ Jo Jr'^ 

For the second term we calculate 


/7 

Vgf(?-b - Vg(g) 

• {Vqf}* p'^){q)dp'ldt 

Jo JR^"* 

L V 7/ J 



< 


< \\D^g\U\yq^UVf 


( 


\Jo 7 


2 \ 

^dp'Jdt I < — 0. 


7 


Therefore 


[ [ ■ {\/q'tp*p'^){q)dp]dt [ [ Vg ■ {V-4j*a)datdt. 

Jo V 7/ Jo JR'* 


□ 


2.6 Discussion 

The ingredients of the convergence proof above are, as mentioned before, (a) a compactness result, (b) a 
local-equilibrium result, and (c) a liminf inequality. All three follow from the large-deviation structure, 
through the rate functional We now comment on these. 

Compactness. Compactness in the sense of measures is, both for p'^ and for a simple consequence 

of the confinement provided by the growth of H. In Theorem 2.4 we provide a stronger statement for ^^p"^, 
by showing continuity in time, in order for the limiting functional I{<j) in (37) to be well defined. This 
continuity depends on the boundedness of I'*'. 

Local equilibrium. The local-equilibrium statement depends crucially on the structure of /^, and more 
specifically on the large coefficient 7 ^ multiplying the derivatives in p. This coefficient also ends up as a 
prefactor of the relative Fisher Information in the a priori estimate (28), and through this estimate it drives 
the local-equilibrium result. 

Liminf inequality. As remarked in the introduction, the duality structure of /''' is the key to the liminf 
inequality, as it allows for relatively weak convergence of p'*' and ■ The role of the local equilibrium is to 
allow us to replace the p-dependence in some of the integrals by the Maxwellian dependence, and therefore 
to reduce all terms to dependence on the macroscopic information only. 

As we have shown, the choice of the coarse-graining map has the advantage that it has caused the (large) 
coefficients 7 and 7 ^ in the expression of the rate functionals to vanish. In other words, it cancels out the 
inertial effects and transforms a Laplacian in p variable to a Laplacian in the coarse-grained variable while 
rescaling it to be of order 1. The choice f{q,p) = q, on the other hand, would lose too much information by 
completely discarding the diffusion. 
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Figure 3: Left: Hamiltonian 9 ((Z,p) H{q,p), Right: Graph F 


3 Diffusion on a graph in one dimension 

In this section we derive the small-noise limit of a randomly perturbed Hamiltonian system, which corre¬ 
sponds to passing to the limit e —>■ 0 in (14). In terms of a rescaled time, in order to focus on the time scale 
of the noise, equation (14) becomes 

dtp^^ =--div{p^J\/H) + App^. (41) 

Here p® € (^([O, T], 7^(IR^)), J = (_?i o) is again the canonical symplectic matrix, Ap is the Laplacian in the 
p-direction, and the equation holds in the sense of distributions. The Hamiltonian H G C^(M^‘^;M) is again 
defined by H{q,p) = p^/2m -I- V{q) for some potential V : —)■ M. We make the following assumptions 
(that we formulate on H for convenience): 

(Al) H >0, and H is coercive, i.e. H{x) -oo; 

(A2) |Vi/|,|Ai/|,|VpiL|2<C'(l + iL); 

(A3) H has a finite number of non-degenerate (i.e. non-singular Hessian) saddle points 0i,...,0„ with 
H{0^) ^ H{Oj) for every i,j G {1,... ,n}, i ^ j. 

As explained in the introduction, and in contrast to the VFP equation of the previous section, equa¬ 
tion (41) has two equally valid interpretations: as a PDF in its own right, or as the Fokker-Planck (forward 
Kolmogorov) equation of the stochastic process 

, dX^ = ^JXHiX^)dt + V2 dWf (42) 

For the sequel we will think of p® as the law of the process Af; although this is not strictly necessary, it 
helps in illustrating the ideas. 

3.1 Construction of the graph T 

As mentioned in the introduction, the dynamics of (41) has two time scales when 0 < £ <C 1, a fast and a 
slow one. The fast time scale, of scale £, is described by the (deterministic) equation 

X = -JXH(x) in (43) 

£ 

whereas the slow time scale, of order 1, is generated by the noise term. 
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The solutions of (43) follow level sets of H. There exist three types of such solutions: stationary ones, 
periodic orbits, and homoclinic orbits. Stationary solutions of (43) correspond to stationary points of H 
(where Vi? = 0); periodic orbits to connected components of level sets along which Vi? ^ 0; and homoclinic 
orbits to components of level sets of i? that are terminated on each end by a stationary point. Since we 
have assumed in (A3) that there is at most one stationary point in each level sets, heteroclinic orbits do not 
exist, and the orbits necessarily connect a stationary point with itself. 

Looking ahead towards coarse-graining, we define T to be the set of all connected components of level sets 
of i?, and we identify T with a union of one-dimensional line segments, as shown in Figure 3. Each periodic 
orbit corresponds to an interior point of one of the edges of T; the vertices of T correspond to connected 
components of level sets containing a stationary point of H. Each saddle point O corresponds to a vertex 
connected by three edges. 

For practical purposes we also introduce a coordinate system on F. We represent the edges by closed 
intervals Ik C M, and number them with numbers k = 1,2,..., n; the pair (?, k) is then a coordinate for a 
point 7 G F, if fc is the index of the edge containing 7 , and h the value of i? on the level set represented 
by 7 . For a vertex O G F, we write O ~ ?fc if O is at one end of edge Ik', we use the shorthand notation 
±i.j to mean 1 if Oj is at the upper end of Ik, and —1 in the other case. Note that if O ~ ?fci, O ~ and 
O ~ ?fc 3 and ho is the value of H at the point corresponding to O, then the coordinates {ho,ki), (^ 0 ,^ 2 ) 
and {ho,ko) correspond to the same point O. With a slight abuse of notation, we also define the function 
A: : —)■ {1,... ,n} as the index of the edge ?fc C F corresponding to the component containing {q,p). 

The rigorous construction of the graph F and the topology on it has been done several times [FW93, 
FW94, BvR14]; for our purposes it suffices to note that (a) inside each edge, the usual topology and geometry 
of apply, and (b) across the whole graph there is a natural concept of distance, and therefore of continuity. 
It will be practical to think of functions / : F —)■ M as defined on the disjoint union Liklk- A function / : F —>■ M 
is then called well-defined if it is a single-valued function on F (i.e., it takes the same value on those vertices 
that are multiply represented). A well-defined function / : F —)■ M is continuous if /|/^ G C{Ik) for every k. 

We also define a concept of differentiability of a function / : F —)• M. A subgraph of F is defined as 
any union of edges such that each interior vertex connects exactly two edges, one from above and one from 
below—i.e., a subtree without bifurcations. A continuous function on F is called differentiable on F if it is 
differentiable on each of its subgraphs. 

Finally, in order to integrate over F, we write dy for the measure on F which is defined on each Ik as the 
local Lebesgue measure dh. Whenever we write /p, this should be interpreted as J2k 

3.2 Adding noise: diffusion on the graph 

In the noisy evolution (42), for small but finite e > 0, the evolution follows fast trajectories that nearly 
coincide with the level sets of ??; the noise breaks the conservation of H, and causes a slower drift of Xt 
across the levels of H. In order to remove the fast deterministic dynamics, we now define the coarse-graining 
map as 

^ ^ F, £,{q,p) := {H{q,p), k{q,p)), (44) 

where the mapping k : —>■ { 1 ,..., n} indexes the edges of the graph, as above. 

We now consider the process ^(Af), which contains no fast dynamics. For each finite £ > 0, ^(Af) is 
not a Markov process; but as £ —)■ 0, the fast movement should result in a form of averaging, such that the 
influence of the missing information vanishes; then the limit process is a diffusion on the graph F. 

The results of this section are stated and proved in terms of the corresponding objects p® and , where 
ff is the push-forward 

/5® := (45) 

as explained in Section 1.1, and similar to Section 2. The corresponding statement about and p^ is that ff 
should converge to some p, which in the limit satisfies a (convection-) diffusion equation on F. Theorems 3.2 
and 3.6 make this statement precise. 
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3.3 Compactness 


As in the case of the VFP equation, equation (41) has a free energy, which in this case is simply the 
Boltzmann entropy 

^{P) = [ plogp/:^ (46) 

where denotes the two dimensional Lebesgue measure in 

The corresponding ‘relative’ Fisher Information is the same as the Fisher Information in the p-variable, 


I{p\C^) 


sup 2 

<peC“(R2) 



1 

2 


|Vp(^P dp. 


(47) 


and satisfies for p = /£^, 

I{fC^\C^)= [ \V^\og f\^ fdqdp, 

whenever this is finite. 

The large deviation functional P : (^([O,T];'P(IR^)) —)• M is given by 


C(p) = sup 


/GCc’ (KxE2) L 


1 1 
J frdpT - J fodpo - J J {dtf + ■ V/ + \f)dptdt J J dpt 


0 R2 


0 R2 


dt 


(48) 


For fixed e > 0, p® solves (41) iff P{p‘^) = 0. 

The following theorem states the relevant a priori estimates in this setting. 

Theorem 3.1 (A priori estimates). Let £ > 0 and let p € C([0,T];7^(M^)) with pt\t=o =■ Po satisfy 

I^{p) + J-{po) -\- f H dpo < C. 

./R2 

Then for any t G [0, T] we have 


Hpt dt<C', 


(49) 


where C' > 0 depends on C but is independent of e. Furthermore, for any t G [0,T] we have 

Tipt) + ^ J^APsIC^) ds < Fip) + F{po). (50) 

See Appendix D for a proof of Theorem 3.1. 

Note that the estimate (50) implies that F{pt) = 'H(pt|£^) is finite for all t, and therefore pt is Lebesgue 
absolutely continuous. We will often therefore write pt{x) for the Lebesgue density of pt- In addition, the 
integral of the relative Fisher Information is also bounded: 0 < j^I{ps\C‘^) ds < C. 

The next result summarizes the compactness properties for any sequence p® with supg/®(p®) < oo. 

Theorem 3.2 (Compactness). Let a sequenee p^ G C'([0, T]; 7^(IR^)) with p'^|t=o =: Po satisfy for a constant 
C > 0 and all e > 0 the estimate 


r{p-)+F{pl)+ [ Hdp^,<c. 

JR2 

Then there exist subsequences (not relabelled) such that 
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1. ^ p in A^([0,r] X M^) in the narrow topology; 

2. p'^ ^ p = ^^p in (^([O,T];'P(r)) with respect to the uniform topology in time and narrow topology on 
V{T). 

Finally, we have the estimate 


1 /■* 

/ T{ps\C^)ds<C for allt 
^ Jo 

The sequence p'^ is tight in A^([0,T] x M^) by estimate (49), which implies Part 1. The proof of part 2 
is similar to Part 2 in Theorem 2.4, and the final estimate is a direct consequence of (50). 


3.4 Local equilibrium 

Theorem 3.2 states that p® converges narrowly on [0, T] x to some p. In fact we need a stronger statement, 
in which the behaviour of p on each connected component of H is fully determined by the limit p. 

Lemma 3.3 below makes this statement precise. Before proceeding we define T : P —)■ M as 


r(7) := [ 


WhM’ 


(51) 


where is the the one-dimensional Hausdorff measure. T has a natural interpretation as the period of the 
periodic orbit of the deterministic equation (43) corresponding to 7 . When 7 is an interior vertex, such that 
the orbit is homoclinic, not periodic, 2 ^( 7 ) = -l-oo. T also has a second natural interpretation: the measure 
T{'^)d'y = T{h,k)dh on P is the push-forward under ^ of the Lebesgue measure on and the measure 
T{'y)d'j therefore appears in various places. 

Lemma 3.3 (Local Equilibrium). Under the assumptions of Theorem 3.2, let p^ ^ p in A4([0,T] x M^) 
with respect to the narrow topology. Let p he the push-forward of the limit p, as above. 

Then for a.e. t, the limit pt is absolutely continuous with respect to the Lebesgue measure, pt is absolutely 
continuous with respect to the measure T{'j)d'j, where T{-;) is defined in (51). Writing 


we have 


Pt{dx) = pt{x)dx and Ptidj) = at{-j)T{'f)d'y, 


Pt{x) = at{f{x)) for almost all a; G and t G [0,T]. 


(52) 


Proof. From the boundedness of I^{p^) and the narrow convergence p^ ^ p we find, passing to the limit in 
the rate functional (48), for any / G x M^) 



JVH ■Vfdptdt = 0. 


(53) 


Now choose any p G Cf:{\f),T] x M^) and any f G Cj(r) such that f is constant in a neighbourhood of 
each vertex; then the function f{t,x) = f{^{x))(p{t,x) is well-defined and in Cf:{if),T] x M^). We substitute 
this special function in (53); since JViLV(C o ^) =0, we have .JVHVf = (( o Applying the 

disintegration theorem to p, writing pt{dx) = pt{d'y)pt{dx\j) with supppt(-| 7 ) C ■^”^( 7)1 obtain 


0 = / [ C{l)pt{d-f) [ Vp-^^^|Vi2|p(-|7)fiJf’^ = [ [ f{i)pt{d'y) [ dr(p\^H\p{-\-f)dJ^^dt, 

Jo Jr J^-Hi) “'0 Jr Ji-Hi) 

where dr is the tangential derivative. By varying f and p we conclude that for p-almost every ( 7 ,t), 
|Vi 2 |p((-| 7 ) = for some 7 ,t-dependent constant > 0 , and since p is normalized, we find that 

for p-a.e. (7, <) : Pt(da:|7) = -^^$-77) (^^)- ( 54 ) 
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This also implies that Pt{'\l) is in fact <-independent. 

For measurable / we now compare the two relations 

[ fdpt= [ f{y)pt{y)dy = f dj [ idw^Ptiy)'^\dy) 

f fdpt= f Mdi) [ f{y)Kdy\i)= [ [ \Jm 

Jr J^-^h) Jr Tin) ^ 5 - 1 ( 7 ) ^dliy)] 

where we have used the co-area formula in the first line and (54) in the second one. Since / was arbitrary, 
(52) follows for almost all t. □ 

3.5 Continuity of p and p 

As a consequence of the local-equilibrium property (52) and the boundedness of the Fisher Information, we 
will show in the following that p and its push-forward p satisfy an important continuity property. We first 
motivate this property heuristically. 

The local-equilibrium result Lemma 3.3 states that the limit measure p depends on x only through Hx). 
Take any measure p G 7^(K^) of that form, i.e. p{dx) = fi^ix))dx, with finite free energy and hnite relative 
Fisher Information. Setting / = / o by Lemma 2.1, Vp/ is well-defined and locally integrable. 



Consider a section fig of the (q,p)-plane as shown in Figure 4, bounded hy q = a and q = b and level sets 
H — h and H — h + e. The top and bottom boundaries 7 and 7 ^ correspond to elements of F that we also 
call 7 and 7 ^; they might be part of the same edge k of the graph, or they might belong to different edges. 
As e —>■ 0, 7 £ converges to 7 . 

By simple integration we find that 



f rip dr = (/( 7 e) - fij))ib - a). 


'7eU7 


where dr is the scalar line element and rip the p-component of the normal n. Applying Holder’s inequality 
we find 


1^- a| \file) - fil)\ 





S —^0 


> 0 . 


This argument shows that / is continuous from the right at the point 7 G F. 

The following lemma generalizes this argument to the case at hand, in which p also depends on time. 
Note that IntF is the interior of the graph F, which is F without the lower exterior vertices. 


22 











Lemma 3.4 (Continuity of p). Let p G 'P{\})^T] x M^), p{dtdx) = f{t,^{x))dtdx for a Borel measurable 
/ : [0, T] X r — )■ M, and assume that 

f X{pt\£'^)dt+ sup F{pt)<oo. 

Jo te[o,T] 

Then for almost all t G [0,r], 7 1 —>■ fft,^) is continuous on IntF. 

Proof. The argument is essentially the same as the one above. For almost all t, pt is Lebesgue-absolutely- 
continuous and I{pt\£,) is finite, and the argument above can be applied to the neighbourhood of any point x 
with VH{x) 7 ^ 0, and to both right and left limits. The only elements of F that have no representative a; G 
with ViF(x) 7 ^ 0 are the lower ends of the graph, corresponding to the bottoms of the wells of H. At all 
other points of F we obtain continuity. □ 

Corollary 3.5 (Continuity of p). Let p he the limit given by Theorem 3.2, and p := its push-forward. 
For almost all t, pt <C T{'j)d'y, and dpt/T{'j)d'j is continuous on IntF. 

This corollary follows by combining Lemma 3.4 with Lemma 3.3. 


3.6 Liminf inequality 

We now derive the final ingredient of the proof, the liminf inequality. Define 

{ sup J{p,9) if Pt < T{'y)d'j, pt{dj) = ft{'y)T{'y)dj with / continuous on IntF, 
gecF"(Rxr) almost all t G [0,T], (55) 

+00 otherwise, 

where 


J{p,9) ■= / grdpT- / godpo- 


(dtgtil) + A( 7 )p"( 7 ) + B{-i)g[{pi))pt{d-i)dt 


A{l){g't{l)fh{dl)dt, (56) 


/o JT 


and we use g' and to indicate derivatives with respect to h. For 7 € F, the coefficients are defined by 

Bij):=^[ T(7):=/ 

^~H7) 


Ah) ^ Bh) ■■=^J 

^( 7 ) \^dl\ Th) Jf-H'v) Ji-H-r) \^dT\ 


(57) 


Note that for our particular choice of H{q,p) = p^/2m -\-V{q), we have Bh) = 1/m. 

The class of test functions in (55) is C/’^(M x F); recall that differentiability of a function / : F —)■ K is 
defined by restriction to one-dimensional subgraphs, and C/’^(Mx F) therefore consists of functions p : F —)■ M 
that are twice continuously differentiable in h in this sense. The subscript c indicates that we restrict to 
functions that vanish for sufficiently large h (i.e. somewhere along the top edge of F). 

Note that again / > 0; formally, /(p) = 0 iff p satisfies the diffusion equation 

dtp = (Ap)” - (Bp)', 


and we will investigate this equation in more detail in the next section. 


Theorem 3.6 (Liminf inequality). Under the same assumptions as in Theorem 3.2, let p^ ^ p in A4{[0, T]; M^) 
and p® := ?#P =• P C'([0,T];'P(F)). Then 

liminf/®(p®) > lip). 

£->■0 
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(58) 


Proof. Recall the rate functional from (48) 

r(/9®) = sup J%P^,f), where 

/eCc'^(MxK2) 

J%P^,f)--= f frdp^T- [ fodpl- r [ (dtf+-JS/H-S/f+Apf)dptdt-ir[ l^pfl^dpldt. 
Jr2 Jr2 Jo jr 2 X £ / /Jo JR2 

Define A := {f = g o ^ : g x F)}. Then we have 

r(p®) > sup J%p‘',f). 
feA 

Since J'\/HS/{g o ^) = 0, upon substituing f = g o into J‘^ the 0{l/e) term vanishes. Using the notation 
g' for the partial derivative with respect to h, dtg for the time derivative, and suppressing the dependence 
of g on time, we find 


J^P^,9°0-= [ grdpT- f godpl- [ [ (dtgif{x))+g"{^{x))i\/pH{x)f+g'{f{x))ApH{x)]pUdx)dt 

Jr Jr Jo JR2 \ / 

-if [ \g\^{x))'^pHix)\^Pt{dx)dt. (59) 

/ Jo JR2 

The limit of (59) is determined term by term. Taking the fourth term as an example, using the co-area 
formula and the local-equilibrium result of Lemma 3.3, the fourth term on the right-hand side of (59) gives 


r[ g"{f{x)){VpH{x)fpl{dx)dt^ rf g"{f{x)){VpH{x)fp,{dx)dt 

-ipp^u . 


\VH{y)\ 

where : T —)■ M is defined in (57). Proceeding similarly with the other terms we find 

liminf/®(p^) > sup J{p,g). 

geCc'^(Rxr) 


(60) 


This concludes the proof of Theorem 3.6. 


□ 


3.7 Study of the limit problem 

We now investigate the limiting functional I from (55) a little further. The two main results of this section 
are that ff can be written as 


J (P) d) = J grdpT - J godpo - 

and that I satisfies 


f 

lo Jr 


dtgt dpt + {{TAg^y + 


dt, 


I{p) > sup J{p,g) for all p G C([0, T]; 7^(r)), 
g<sA 


(61) 

(62) 


where A is the larger class 

A'.= \g : C^’°(M X T) : g\p^ G Cj’^(M x Ik), V interior vertex Oj Vt : ^ ±kj g't{Oj,k) TA{Oj,k) = o|- 

k:Ik~Oi ^ 


(63) 
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The admissible set A relaxes the conditions on g at interior vertices: instead of requiring g to have identical 
derivatives coming from each edge, only a single scalar combination of the derivatives has to vanish. (In fact 
it can be shown that equality holds in (62), but that requires a further study of the limiting equation that 
takes us too far here.) 

Both results use some special properties of T, A, and B, which are given by the following lemma. In 
this lemma and below we use TA and TB for the functions obtained by multiplying T with A and B] these 
combinations play a special role, and we treat them as separate functions. 

Lemma 3.7 (Properties of TA and TB). The functions TA and TB have the following properties. 

1. TA G C^{Ik) for each k, and (TA)' = TB; 

2. TA is bounded on compact subsets of P; 

3. At each interior vertex Oj, for each k such that Ik ~ Oj, TA{Oj,k) := lim TA{h,k) exists, and 

h&Ik 

h^Oj 

^ ±kjTAiO„k) = 0. (64) 


From this lemma the expression (61) follows by simple manipulation. 

With these two results, we can obtain a differential-equation characterization of those p with I{p) = 0. 
Assume that a p with I{p) =0 is given. By rescaling we find that for all g G A, 


/ grdpT — / 9odpo — / 
/r Jr Jo 


dtgt dp + (TA g't)' ^ 


dt. 


(65) 


As already remarked we find a parabolic equation inside each edge of P, 


dtPt = {tA = (Apt)" - (Bpt)'. (66) 

We next determine the boundary and connection conditions at the vertices. 

Consider a single interior vertex Oj, and choose a function g G A such that swppg contains no other 
vertices. Writing pt{d'y) = ft{l)T{l)dj we find first that ft is continuous at Oj, by the definition (55) of I. 
Then, assuming that p is smooth enough for the following expressions to make sense^, we perform two partial 
integrations in 7 and one in time on (65) and substitute ( 66 ) to find 



The first term vanishes since g G A, while the second term leads to the connection condition 


at each interior vertex Oj : E ±k,TA{Oj,k)ft{0„k) = G. 


The lower exterior vertices and the top vertex are inaccessible, in the language of [Fel52, Man 68 ], and 
therefore require no boundary condition. Summarizing, we find that if /(p) = 0, then p =: fTd'^ satisfies a 
weak version of equation ( 66 ) with connection conditions 

at each interior vertex Oj : f is continuous and E ±kjTA{0„k)ft{0„k)=Q. 

k-I k j 

^This can actually be proved using the properties of A and B near the vertices and applying standard parabolic regularity 
theory on each of the edges. 
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This combination of equation and boundary conditions can be proved to characterize a well-defined semigroup 
using e.g. the Hille-Yosida theorem and the characterization of one-dimensional diffusion processes by Feller 
(e.g [Fel52]). 

We now prove the inequality (62). 

Lemma 3.8 (Comparison of I and I). We have 

Hp) > Hp) ■= supj{p,g). 

geA 


Proof. Take p such that /(p) < oo, implying that ptidj) = fti'j)T(pf)d'j with ft continuous on IntF for 
almost all t. Choose g G A; we will show that I{p) > il{p,g), thus proving the lemma. For simplicity we 
only treat the case of a single interior vertex, called O; the case of multiple vertices is a simple generalization. 
For convenience we also assume that O corresponds to h = 0. 

Define 


gs,t{h,k) = gtih,k)Cs{h) + (1 - Cs{h))gti0), (67) 

where l^s is a sequence of smooth functions such that 

• C,s is identically zero in a (5-neighbourhood of O, and identically 1 away from a 2(5-neighbourhood of O; 

• C<5 satishes the growth conditions IC^I < 2/(5 and IC5 | < 4/(5^. 

We calculate J{p,gs)- The limit of the first three terms is straightforward: by dominated convergence 
we obtain 


/ gs.rdpT - / gsfidpo- / / dtgs,tdpt / gxdpr - / godpo - / / dtgtdpt- 

JT Jv Jo Jr Jr Jr Jo Jr 


Next consider the term 

cT 


/ / A{l)9s{l)Pt{di)dt= f j 
JQ Jr Jo Jr 


g"ih, k)Cs{h) + 2 Cs{h) 9 \h, k) + Cs\h) [hg'iO, k) + 0{h^)] Aij)pt{dj)dt. 

( 68 ) 


Since the function (7,t) i—)■ G L^{pt) the first term in (68) again converges by dominated 

convergence : 


[ [ 9 t{h,k)Cs{h)A{h,k)ptid'-f)dt[ [ g'f {h, k)A{h, k)pt{d'^)dt. 
Jo Jr Jo Jr 


Abbreviate ft{'j)TA{j) as 0(7); note that a is continuous and bounded in a neighbourhood of O. Write the 
second term on the right-hand side in (68) as (supressing the time integral for the moment) 


2 J Csik) 9 '{h,k)a{h,k)dh = 2 j C,'^{h)g'(h,k){a{h,k) - a(f),k))d^ + 2'^a{f),k) J C,'s{h){g'{h,k) - g'{Q,k))dh 

+ 2Va(0,%'(0,A:) [ CsWdh 

k 

^0 + 0-2 ^ ±ko9'i0,k)a{0,k)=2 ^ 5'(0, A:)/(O, A:) TA(0, fc). 

k:lk'^0 k:lk'^0 
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The limit above holds since —Q{-,k) converges weakly to a signed Dirac, ±koSo, as <5 —)■ 0. Proceeding 
similarly with the remaining terms we have 




grdpT- [ godpo- [ f {dtgt + A{j)g't'{-f) + B{j)g't{-f))ptid'y)dt 
’ Jr Jo Jr 

^ [ [ A{-i)g^{-ff pt{d-i)dt - [ /t(0,fc) ±koTA{0,k)gt{0,k) 

^ Jo Jr Jo ^ 


'-k-.Ik^O 


dt. 


Note that the final term vanishes by the requirement that g G A, and therefore the right-hand side above 
equals j{p,g)- This concludes the proof of the lemma. □ 


We still owe the reader the proof of Lemma 3.7. 


Proof of Lemma 3. 7. We first prove part 1. For simplicity, assume first that H has a single well, and therefore 
r has only one edge, k = 1. Since 


div 




ApH, 


and remarking that the exterior normal n to the set H < h equals (0, Vpi7/|Vi7|)^, 


we calculate that 


/ =TAih). (69) 

{H<h} {H=h} 


By the smoothness of H, the derivative of the left-hand integral is well-defined for all h such that ViL 0 
at that level. At such h we then have 

TBih)= j ^^dJ^^ = dh j ApH = dhTA{h). 

{H=h} {H<h} 


For the multi-well case, this argument can simply be applied to each branch of F. 

For part 2, since H is coercive, {H < K] is bounded for each h; since H is smooth, therefore ApH is 
bounded on bounded sets. From (69) it follows that TA also is bounded on bounded sets of F. 

Finally, for part 3, note first that TB is bounded near each interior vertex. This follows by an explicit 
calculation and our assumption that each interior vertex corresponds to exactly one, non-degenerate, saddle 
point. Since (TA)' = TB, TA has a well-defined and finite limit at each interior saddle. The summation 
property (64) follows from comparing (69) for values of h just above and below the critical value. For 
instance, in the case of a single saddle at value h = 0, with two lower edges fc = 1,2 and upper edge k = 0, 
we have 


lim TA(h, 1) 

h^O 


- TA{h, 2) 



{-i((-oo.?i]x{l}) 


ApH 


+ J ApH 



{H<h} 


ApH 


lim / Ar,H = lim TA{h,0). 
HO J HO 

{H<h} 


This concludes the proof of Lemma 3.7. 


□ 


3.8 Conclusion and discussion 

The combination of Theorems 3.2 and 3.6 give us that along subsequences pP := ^^p^^ converges in an 
appropriate manner to some p, and that 

Hp) < liminfC(p®). 

£->■0 
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In addition, any p satisfying /(p) = 0 is a weak solution of the PDE 

d,p = {Ap)" - [Bp)' 

on the graph F. This is the central coarse-graining statement of this section. We also obtain the boundary 
conditions, similarly as in the conventional weak-formulation method, by expanding the admissible set of 
test functions. 

In switching from the VFP equation (9) to equation (41) we removed two terms, representing the friction 
with the environment and the interaction between particles. Mathematically, it is straightforward to treat 
the case with friction, which leads to an additional drift term in the limit equation in the direction of 
decreasing h. We left this out simply for the convenience of shorter expressions. 

As for the interaction, represented by the interaction potential tp, again there is no mathematical necessity 
for setting = 0 in this section; the analysis continues rather similarly. However, the limiting equation will 
now be non-local, since the particles at some 7 € P, which can be thought of as ‘living’ on a full connected 
level set of H, will feel a force exerted by particles at a different 7 ' G P, i.e. at a different level set component. 
This makes the interpretation of the limiting equation somewhat convoluted. 

The results of the current and the next sections were proved by Freidlin and co-authors in a series of 
papers [FW93, FW94, FW98, FWOl, FW04], using probabilistic techniques. Recently, Barret and Von 
Renesse [BvR14] provided an alternative proof using Dirichlet forms and their convergence. The latter 
approach is closer to ours in the sense that it is mainly PDF-based method and of variational type. However, 
in [BvR14] the authors consider a perturbation of the Hamiltonian by a friction term and a non-degenerate 
noise, i.e. the noise is present in both space and momentum variables; this non-degeneracy appears to be 
essential in their method. Moreover, their approach invokes a reference measure which is required to satisfy 
certain non-trivial conditions. In contrast, the approach of this paper is applicable to degenerate noise and 
does not require such a reference measure. In addition, certain non-linear evolutions can be treated, such as 
the example of the VFP equation. 

4 Diffusion on a graph, d > 1 

We now switch to our final example. As described in the introduction, the higher-dimensional analogue of 
the diffusion-on-graph system has an additional twist: in order to obtain unique stationary measures on level 
sets of we need to add an additional noise in the SDF, or equivalently, an additional diffusion term in the 
PDE. This leads to the equation 

1 tv 

dtp = — dw(pJVH) -\— div(aV/ 9 ) -I- App, (70) 

e £ 

where a : —)■ K 2 (ix 2 d aVH = 0, dim(Ker(a)) = 1 and «:,£ > 0 with e. The spatial domain is 

d > 1, with coordinates {q,p) € x M”*. Here the unknown is trajectory in the space of probability 
measures p : [0, T] —> 7^(1^^“*); the Hamiltonian is the same as in the previous section, H : —)■ M given by 

H{q,p) = p^/2m -I- V{q). 

The results for the limit e —>■ 0 in (70) closely mirror the one-degree-of-freedom diffusion-on-graph problem 
of the previous section; the only real difference lies in the proof of local equilibrium (Lemma 3.3). For a 
rigorous proof of this lemma in this case, based on probabilistic techniques, we refer to [FWOl, Lemma 3.2]; 
here we only outline a possible analytic proof. 

Along the lines of Theorem 3.1, and using boundedness of the rate functional I'^{p^), one can show that 

1 r f + - r [ < c. 

2 Jo JM2 P® £ Jo J^2 P® 
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Multiplying this inequality by s/k and using the weak convergence ^ p along with the lower-semicontinuity 
of the Fisher information [FK06, Theorem D.45] we find 


aVp • Vp 


Jo JvP 

or in variational form, for almost all t G [0,T], 


= 0 , 


0= sup j div(aVip)pt — - [ aW(p-W(ppt 

¥>eC“(R2^) 2 J^2d 

^ 0 = [ div(aV(p)pt, Vv 5 G 


Applying the co-area formula we find 


p{x) 




|Vi?(x)| 


diy{a{x)V(p{x)) ^{dx)=0, 


(71) 


where ^ is the {2d — 1) dimensional Haursdoff measure. Let be the {2d — 1) dimensional manifold 

^“^( 7 ) with volume element Then (71) becomes 


p{x) divM{ 0 ‘{x)'^M‘p{x)) yo\M{dx) = 0 , 


/AT, 


where div_Aa and are the corresponding differential operators on AA-y, and vol» is the induced volume 
measure. Since aViJ = 0, dim(Ker(a)) = 1, a is non-degenerate on the tangent space of Aiy. Therefore, 
given ^/) G C°°{Aiy) with tp d\o\M = 0, we can solve the corresponding Laplace-Beltrami-Poisson 
equation for ip, 


div7n(aV7n<p) = pJ, 


and therefore 


f pipdvolM =0, Vf/' G C°°{A4y) with f ipdvolM = 0. 

JM^ JM-, 

Since AAy is connected by definition, it follows that p constant on Aiy] this is the statement of Lemma 3.3. 


5 Conclusion and discussion 


In this paper we have presented a structure in which coarse-graining and ‘passing to a limit’ combine in 
a natural way, and which extends also naturally to a class of approximate solutions. The central object 
is the rate function I, which is minimal and vanishes at solutions; in the dual formulation of this rate 
function, coarse-graining has a natural interpretation, and the inequalities of the dual formulation and of 
the coarse-graining combine in a convenient way. 

We now comment on a number of issues related with this method. 


Why does this method work? One can wonder why the different pieces of the arguments of this paper 
fit together. Why do the relative entropy and the relative Fisher information appear? To some extent this 
can be recognized in the similarity between the duality definition of the rate function / and the duality 
characterization of relative entropy and relative Fisher Information. The details of Appendix B show this 
most clearly, but the similarity between the duality definition of the relative Fisher information and the 
duality structure of I can readily be recognized: in (19) combined with (18) we collect the 0 ( 7 ^) terms 




1 

2 


|Vp/t|" 


dptdt, 
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and these match one-to-one to the definition (24). This shows how the structure of the relative Fisher 
Information is to some extent ‘built-in’ in this system. 

Relation with other variational formulations. Our variational formulation (2) to ‘passing to a limit’ is 
closely related to other variational formulations in the literature, notably the 4'-4'* formulation and the 
method in [PSCF05, ASZ09]. In the formulation, a gradient flow of the energy fg : Z —)■ M with 

respect to the dissipation is defined to be a curve p® G C'([0,T],Z) such that 

A%p) := Seipr) - Se{po) + [ [i’e{pt,Pt) + K(.-^^e{pt),Pt)]dt = 0. (72) 

Jo 

‘Passing to a limit’ in a structure is then accomplished by studying (Gamma-) limits of the func¬ 

tionals A^. The method introduced in [PSCF05, ASZ09] is slightly different. Therein ‘passing to a limit’ 
in the evolution equation is executed by studying (Gamma-)limits of the functionals that appear in the 
approximating discrete minimizing-movement schemes. 

The similarities between these two approaches and ours is that all the methods hinge on duality structure 
of the relevant functionals, allow one to obtain both compactness and limiting results, and can work with 
approximate solutions, see e.g. [AMP+I2] and the papers above for details. In addition, all methods assume 
some sort of well-prepared initial data, such as bounded initial free energy and boundedness of the functionals. 
Our assumptions on the boundedness of the rate functionals arise naturally in the context of large-deviation 
principle since this assumption describes events of a certain degree of ‘improbability’. 

The main difference is that the method of this paper makes no use of the gradient-flow structure, and 
therefore also applies to non-gradient-flow systems as in this paper. The first example, of the overdamped 
limit of the VFP equation, also is interesting in the sense that it derives a dissipative system from a non- 
dissipative one. Since the GENERIC framework unifies both dissipative and non-dissipative systems, we 
expect that the method of this paper could be used to derive evolutionary convergence for GENERIC systems 
(see the next point). Finally, we emphasize that using the duality of the rate functional is mathematically 
convenient because we do not need to treat the three terms in the right-hand side of (72) separately. Note 
that although the entropy and energy functionals as well as the dissipation mechanism are not explictly 
present in this formulation, we are still able to derive an energy-dissipation inequality in (4). 

Relation with GENERIC. As mentioned in the introduction, the Vlasov-Fokker-Planck system (8) com¬ 
bines both conservative and dissipative effects. In fact it can be cast into the GENERIC form by introducing 
an excess-energy variable e, depending only on time, that captures the fluctuation of energy due to dissipa¬ 
tive effects (but does not change the evolution of the system). The building blocks of the GENERIC for the 
augmented system for (p, e) can be easily deduced from the conservative and dissipative effects of the original 
Vlasov-Fokker-Planck equation. Moreover, this GENERIC structure can be derived from the large-deviation 
rate functional of the empirical process (7). We refer to [DPZI3] for more information. This suggests that 
our method could be applied to other GENERIC systems. 

Gradient flows and large-deviation principles. As mentioned in the introduction, this approach using the 
duality formulation of the rate functionals is motivated by our recent results on the connection between gen¬ 
eralised gradient flows and large-deviation principles [ADPZII, ADPZ13, DPZI4, DPZI3, DLZI2, MPR14]. 
We want to discuss here how the two overlap but are not the same. In [MPRI4], the authors show that if Af® 
is the adjoint operator of a generator of a Markov process that satisfies a detailed balance condition, then 
the evolution (1) is the same as the generalised gradient flow induced from a large-deviation rate functional, 
which is of the form .5^’^{pt, pf) dt, of the underlying empirical process. The generalised gradient flow 
is described via the structure as in (72) with .^^{z,z) = 4'e(2;, i) -|- 4'J(2, —D£e(z)) -|- {DEs{z),z). 

Moreover, and 4'e can be determined from .if® [MPR14, Theorem 3.3]. However, it is not clear if such 
characterisation holds true for systems that do not satisfy detailed balance. In addition, there exist (gener¬ 
alised) gradient flows for which we currently do not know of any corresponding microscopic particle systems, 
such as the Allen-Cahn and Gahn-Hilliard equations. 

Quantification of coarse-graining error. The use of the rate functional in a central role allows us not 
only to derive the limiting coarse-grained system but also to obtain quantitative estimates of the coarse- 
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graining error. Existing quantitative methods such as [LLIO] and [GOVW09] only work for gradient flows 
systems since they use crucially the gradient flow structures. The essential estimate that they need is the 
energy-dissipation inequality, which is similar to (4). Since we are able to obtain this inequality from the 
duality formulation of the rate functionals, our method would offer an alternative technique for obtaining 
quantitative estimate of the coarse-graining error for both dissipative and non-dissipative systems. We 
address this issue in detail in a companion article [DLP+]. 

Other stochastic processes. The key ingredient of the method is the duality structure of the rate functional 
(5) and (10). This duality formulation holds true for many other stochastic processes; indeed, the ‘Feng- 
Kurtz’ algorithm (see chapter 1 of [FK06]) suggests that the large-deviation rate functional for a very wide 
class of Markov processes can be written as 

/(p) = sup |(/t,Pt) - (/o,Po) - ^ {ft,Pt)dt-J n{pt,ft)dt^, 

where H is an appropriate limit of ‘non-linear’ generators. The formula (10) is a special case. As a result, 
we expect that the method can be extended to this same wide class of Markov processes. 

A Proof of Lemma 2.1 

Define X(/) to be the right-hand side in (25), 

-T/f\ if ^^{/>o} + — fdqdp, if S/pf e Ll^idqdp), 

I oo otherwise. 

for / e L^(IR^‘’*). We need to show that X{f) = T{f dqdp\pL). 

First assume that X is finite. Then ^^ll{/>o} + m ^ L'^ifdqdp), which implies the following stronger 
statement. 

Lemma A.l. One has 

“^l{/> 0 } 4-€ L^{fdqdp), 

j ^ ‘ m 

where the space Ly{fdqdp) is defined as the closure of {Vpp : p G C“(M^'^)} with respect to the norm 

\\-\\}dqdp'= fdqdp. 

Assuming Lemma A.l for the moment we rewrite X{f) as 

=/.j¥*u>o,+(^(¥*u>o,+^))ir_,,,„„„ 

= I|-V,-(V,/ + /^)) 

where || • \\-i.fdqdp is the dual norm (in duality with L‘^(fdqdp)) from [DPZ13] and l{y>o}Vp/ = Vp/ 
holds due to Stampacchia’s Lemma [KSOO, Theorem A.l]. Following the variational characterization of 
11 • \\-i.(fdqdp) from [DPZ13, (11)] we Anally obtain 

X{f)= sup 2 / (Vpp-— 

= sup 2 / (Vpp •- App - ijVpp]^) /dgdp, 

<pGC“(R2‘i) \ m J 

which is the claimed result. The same reference also provides that I = 00 iff X{ f dqdp\p,) = 00 . 
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Proof of Lemma A.l. We assume that ^pl{/>o} + m ^ L'^ifdqdp) and show that the two individual terms 
^f^^{f>o} ^ are in Ly{fdqdp). Choose a smooth cut-ofF function rjn = ri{x/R) with p : 

p = 1 on Bi{0) and 77 = 0 in \ - 62 ( 0 ). Then 


f L.Yil 

lm. 2 d m f 


l{/>0}/ = -/ •'^p/1{/>0} = - / ^7^ • Vp(l{/>0}/) 

J^ 2 d m J^ 2 d m 


1 

m j^ 2 d L 


PRd + p- VpPR l{/>0} / <- p- "^pVrI =■ HR)- 

J ™ jR^d 


As i? —>■ 00 , the bound b{R) converges to d/m. 
Therefore we have 


/B 2 ‘i 


VR 


Vp/ 


7-Ji{/>0} 


/ = 


/ PRYMf>o} + ^"f-2[ r7«Vp/.i%>o} 

jR 2 d J^2d 


< 2b(R) 


/R 2 <i 


»7R 


'^l{/> 0 } + ^| /■ 


By passing to the limit i? —>■ 00 we obtain 


lim / 

_R->oo 7 R 2 £i 


VR 


^pf 


T 


/< 


Vp/ 


7-l{/>o} + 


2d 

/ H- <00 

m 


and thus ^ L‘^{fdqdp). To conclude the proof of Lemma A.l it remains to show that 

^plL{/>o},^ can be approximated by gradients of C^-functions. To this end we consider, for £ > 0, 
the smooth cut-off function p^ := p{xs) with p as above and define 


Pe ■■ = 


log ( - A (/Ve) ) - log£ 


Tie- 


Then has compact support in Note that is not necessarily smooth, but by convolution with a 
mollifier we can also achieve smoothness. For the gradient one obtains 


1bi ( 0 )^ + 1b2(0)\Bi ( 0 ) + Vp77e(log/ - logs)) 

^pPe = 1b 2 (0)\Bi (0)Vpr7£ (log i - log£) 

e e 

0 


for {e<f<\} 
for {/ > i} 
for {/ < e} 


Our aim is to show that 


f^^{f>0} '^pPe 


fdqdp 


0 as e —>■ 0. Indeed, 


J'^l{/>0} '^pPs 


f = 


/ 

V 

0 


Hf<A 

I •' I 



^l{/>0} - Vp? 7 e (log A -log£ 1 1 b 2 ( 0 )\Bi( 0 ) 


Oe</<i} 


(l-%)^l{/> 0 } - Vp? 7 e(log/-log£)| 1 b2(0)\Bi(0) / 
2 


^pf -f| 

= : le + lie + life + IV, 


lR2ti\B2(0) / 
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Since ■^pl{/>o} € Li^if dqdp) we directly conclude that and IVg vanish in the limit as e —>■ 0. Concerning 

He and Ille we note that, for {s < f < one has 


iVpjyeClog/- loge)|^ < |Vp? 7 e|^ |log l/er - loge|^ 


I Vp?7e I 



2 

< Ce, 


where we exploited jVpryep < and (log for some £-independent constant C. This shows that 

also He and Illg vanish in the limit as £ —)■ 0. To sum up, we conclude that ^^l{/>o} € L'y{fdqdp). The 
calculation for ^ = Vp is similar. □ 


B Proof of Theorem 2.3 


In this appendix, we prove Theorem 2.3 using the method of the duality equation; see e.g. [ACP82, RK82, 
BKP85, Eid90] or [BKRS15, Ch. 9] for examples. Throughout this appendix 7 is fixed. 

We recall the functional /'*' : (^([O, T];—> M defined in (19) 


r{p) = sup 

/eCj'^(RxR2^) 


1 ^ 1 

j frdpT- j fodpo- j j (^dtft+^pJ?jdptdt-Y J J dptdt 

R2d R2d 0 K2d 0 R2£i 


(73) 


where is given by 

X/ = lJy{H + V- * ^^) • V/ - 7"- • Vp/ + 72 Ap/. ( 74 ) 

m 

In addition to the duality definition of the Fisher Information (24) we will use the Donsker-Varadhan 
duality characterization of the relative entropy (21) for two probability measures (see e.g. [DE97, Lemma 
1.4.3]) 




sup 



(j>dv — log 


e‘^dp, 


which implies the corresponding characterization of the free energy ( 22 ) 


J-(^) 


sup / 
0GC“(R2-i) dR2ti L 


’ + 2 ^ * I'ldv — log 


i>-H 


R 2 ti 


dx + log Zh- 


(75) 


We first present some intermediate results which we will use to prove Theorem 2.3. 

Lemma B.l. Let p G C{[0,T];r(M.‘^'^)). 

1. The maps t ijj * pt and 1 1 —> Vt/’ * pt are continuous from [0, T] to C't,(M‘^); 

2. If n {p),'H{po\Zfj^e~^) < 00 , then J Hpt < 00 for all t S [ 0 ,r]. 

Proof. The first part follows from the bound 7 C'^(K<i). Fix £ > 0, t G [0,T], and take a 

sequence — )■ t. For each n, choose Xn € such that jf/i * {pt — Pt„)|(a;„) > jjV’ * {pt — Pt„)||oo — e/2. 
Since pt„ —)■ pt narrowly, {pt„}n is tight, implying that can be chosen bounded; therefore there exists a 
subsequence (not relabelled) such that —)■ a: as n —)■ 00 . Then 


Hfj * Pt){Xn) -{if* PtJ{Xn)\ <\{lp* Pt){Xn) -{if* Pt){x)\ P \{lf * Pt){x) -{if* PtJ{x)\ 

+ \{lf* Pt„){x) -{if* PtJ{Xn)\- 


The last term on the right-hand side satisfies 


\{^*PtJ{x) - {lf*PtJ{Xn)\ < 


\i’{x -y)- if{xn - y)\pt„{yiz) dydz^o 
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since ~ ^ ~ ') uniformly, and a similar argument applies to the first term. The middle term 

converges to zero by the narrow convergence of pt„ to pt- This proves that the function t ^ ip * pt is 
continuous; a similar argument applies to t ^ Vip * pt- 

For the second part, we take in (73) the function f{q,p,t) = ({H{q,p)), where ( G C'°°([0,oo)) is a 
smooth, bounded, increasing truncation of the function /(s) = s, satisfying 0 < < 1 and C" < 0. Then we 

find 

r 

f aH)pr- f aH)po-pip)< f f (-iC--^,p’*Pt+i^(c" + hc^-c)^+i\'-)dptdt 
J^2d J^2d J J \ m \ ^ J m) 

0 K2<i 

r 

< J J * Ptfdptdt <^\\Vg'lp\\l^ + j‘^^T. 

0 R2<i 


The result follows upon letting C converge to the identity. 

Note that this inequality gives a bound on f Hpt for fixed 7 , but this bound breaks down when 7 —>■ 00 . 
The bound (29), which is directly derived from (28), gives a 7 -independent estimate. □ 

In the next few results we study certain properties of an auxiliary PDF and its connection to the rate 
functional. 

Theorem B.2. Given (p G C“(M^‘^) and p G C'“([0,T] x there exists a function f G -bj(,^([0,T] x M^*^) 
which satisfies the following equation a.e. in ij'Q(.([0,r] x M^'^) (i.e. for each compact set K C [0,T] x 
the equation is satisfied with all weak derivatives and all terms in L^{K)): 

dtf + ^pf + y |Vp/|2 + 7 JVi 7 ■yiP*pt = -l^App - VpH ■ Vpp - , (76a) 

/|t=T = (p (76b) 

where is defined in (74). The final-time condition (76b) is satisfied in the sense of traces in Lj'Q,,(IR^'^) 
(which are well-defined since dtf G L[qj.([0,T] x The solution satisfies \ f\ < C(1 + 77)^/^ for each 

t G [0,T] and almost everywhere in for some constant C > 0. Finally, 


t i-G 

Proof. The Hopf-Cole tranformation / 
into 



-^dx 


is non-decreasing. 


7R2ti 

= 21ogg and the time reversal t^T 


(77) 


t transform equation (76a) 


dtg- 


-Vip * Pt - 


Ap(p — VpH ■ VpLp 



(78) 


with initial datum (now at time zero) go = The analysis of equation (78) is non-standard and therefore 
we study this equation separately in Appendix C. The existence and uniqueness of a solution, with this initial 
value, follow from Corollary C.7. The solution g satisfies (78) a.e. in Ll^{[0,T] X M 2 d) by Proposition C.13. 
Furthermore, by Proposition C.IO there exist constants ai, a 2 , Pi, such that 

ai exp (^—Pity^LVi -\- < g < a 2 exp (^P 2 ty/uj 2 + • 

Finally, by Proposition C.ll we have 


t !->■ 



is non-increasing. 


Transforming back to / we find the result. 


□ 
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To prove the second main result on the auxiliary equation (76a), which is Proposition B.4 below, we 
will need the following lemma. For the rest of this appendix we write for convolution in time and for 
convolution in space {x = {q,p)). (The convolution ip P the same as the notation ip * p used in the rest 
of this paper.) 

Lemma B.3. Let f satisfy 

+ = (79) 

a.e. in x M^'^) with $ € x Define fs := ixs *x f and fe := % f, where ijg = Pe{t) is 

a regularizing sequence in the t-variable and vs = vs{q,p) is a regularizing sequence in the {q,p)-variables. 
Then we have 

dtfs+^pjs + ylVp/ip < vs *x ^ + ^5\\d'^H\\L<=^{vs |V/| +-ivs *x |Vp/|) 

+ "f(^JVip -^x Pt ■ '^fs - vs *x {JVip *x Pt ■ V/)^, (80) 

2 

dtfe + ^pje + \Wpfe? < he H + l(,JVlP ptV f, ” Ve *t (JV^ Pt ' V/)) . (81) 

Proof of Lemma B.3. Using (79) and the definition of J^p we have 

r 2 

0 = y Pspt - T){dtf + /\pf - 'y'^VpH ■ Vp/ + 7JV(i7 -\-ip*x Pt) ■ V/+ y |Vp/p - $j(T,a:)fir 

2 

= (dtfe + -f‘^^pfs - ■ Vp/e + 7JV(i7 TV' Pt) ■ Vf^{t,x) + yrye *t |Vp/p - pe *t <^{t,x) 

+ 7y Veit - T)(^J\/lp *x Pt - *x Pt^'^f{T,x)dT. (82) 

By Jensen’s inequality we have % *t |Vp/p > |Vp/ep. Substituting this inequality into the relation above 
completes the proof of (81). The proof of (80) follows similarly. □ 

The next result connects the solution of the auxiliary equation (76a) to the rate functional (73). 

Proposition B.4. Let f be the solution of (76a)-(76b) in the sense of Theorem B.2. Then for r € [0,T] 
we have 


[ Pr(fT + lip*xPT)-[ f |9t/ + -Sfp/+^|Vp/p+7JVi7 • 

jR^'i V ^ /Jo jR2d I Z ) 

</(p) + J'(po)+ log / e^'>~^dx-log Zh- (83) 

Proof. We first show that for every r € [0, T], 

[ p(f+l'^*xp) -( [ \dtf + .^pf P^l^pfl'^ -Vip *x Pt] dptdt, (84) 

J-g 2 d \ Z / 0 Jo jR 2 d I Z J 


I{p) > sup / p( 
feA 


where 

A={f eC^’\[0,T]xm^^) : |9t/|,|V/T,|A/| <C(l + i7)}. 

Formally, this follows from substituting in the rate functional (73) f{t, x) = \pip *x P + f\ {t, x)x[o,T]{'t) with 
f G A, and where X[o,t] is the characteristic function of the interval [0 ,t]. The rigorous proof follows by 
choosing in the rate functional (73) the function 

fn = Sn *t i^dn *t *x p) + /Ci 
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for some f € A and ^ € C“((0,r)). Here d„(<) := nS{nt) is an approximation of a Dirac. Upon rearranging 
the time convolutions, letting n —)■ oo, using Lemma B.l, and letting ^ converge to the function X[o,t]i we 
recover (84). 

From (84) we now derive (83). From here onwards we denote the expression in the supremum on the 
right hand side of (84) by J(p, /) and use the notation 

4- := -7"(Ap¥^ - VpH ■ VpP - . (85) 

Our aim is to substitute the solution / of (76a)-(76b) into (84). To do this, we first extend / outside 
[0,T] X by constants and define 


fs ■= Vs *X f, fs,e '■= Ve *t fs, 

where ? 7 e(t), vs{q,p) are again regularizing sequences in time and space. The rest of the proof is divided into 
the following steps: 

1. We first show that J{p, fs,e) is well defined. 

2. We then successively take the limits e — > 0 and d — > 0 in f7(p, fs,s)- 

3. We finally show that the limit satisfies (83). 

Step 1. Let us first show that J{p,fs,e) is well defined. From Theorem B.2 we know that / satisfies 
I/I < U (1 + and therefore we find 

\dtfs,e\, Wfs,e\, IJVV' Pt ■ V/yel, I JVi7 • V/y,|, \VH ■ V/y^l < C{1 + H), ( 86 ) 

where the constant C depends on 6 and e. The last two objects are bounded since |Vi7p < (7(1 + iJ); 

similar estimates hold for fs- These bounds combined with Lemma B.l imply that the integrals in f7(p, fs,e) 
are well defined and using (84) it follows that 

J{p,fs,e)<Iip)- 

Step 2. Now we consider the convergence of J{p,fs,E) as e — >■ 0. Since all the derivatives of / in (76a) 
are in + 2) x K^'^) (recall that we have extended / by constant functions of {q,p) outside [0,T]) 

the same is true for the corresponding derivatives oi fs ■= vs *x f, and therefore using standard results, the 
following convergence results hold in L[/,j.(IR x M^*^) as e —>■ 0 , 

fs,e ^ fs, dtfs,e ^ dtfs, Vfs,e ^ V/ 5 , ^pfs,e ^ ^pfs- (87) 

Let us first consider the single-integral terms in J{p,fs,e)- Since fs € IU^’^( 0 , T; L^(i?ij)) for any i? > 0, 

we have 


fs,e^fs inTUi’i(0,T;Li(H«)), 


which together with the trace theorem implies that 


fs,i 


S —^0 


t—0,T 


> Is 


t — 0,T 


in L^{Br) and a.e. along a subsequence. 


( 88 ) 


Since the traces of fs and fs^e at t = 0,t are continuous in {q,p), this convergence holds everywhere in B^. 
Combining this convergence statement with the estimate (86) and Lemma B.l and using the dominated 
convergence theorem we find 




t = 0,T 
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Now consider the double integral in J{p, fs,e)- Using the estimate (81) with the choice 


= djs + 


we have 


limsup / [ (dtfs,E + ■^ptfs,s+ ^\"^pfs,s\'^-Vil; pt) dptdt 

£->■0 Jo jR2<i V ^ / 

^limsup / [ (pe*t dtfs + .^pjs+ \\ypfs\'^ +'fJV H ■ pt) dptdt 

£->0 Jo jR 2 d V L Z i / 

+ / / hJVp^*xPf'^fs,e-Vs*thJ'^^*xPf'^fs))dptdt 

Jo Jr^‘‘ ^ ^ 


Since t ^ Vip *x Pt is continuous (see Lemma B.l), it follows that for all x € 

tH> / ps(t-s) ^JVip *x Pt - *x Ps V/ 5 (s,a:) ds 0 in L^(0 ,t). 

Jr l J 

Using this convergence along with (87) we find 

r’’’ r 2 

limsup/ / {dtf5,e+■^ptfs,e + \\^pfs,e? +lJ^H -V-lp^x Pi)dptdt 

£->0 Jo \ z / 

/*7" /* 2 

< / / fai/i+^pj5 + ^|VpM2+7JVi7-VV'*xPt)dptdt. (89) 

Jo JR2d \ zi / 

Combining these terms and using I{p) > liminfe_).o JJ{p, fs,s) we have 

/ d(/5 + RV'*a:p) f (dtfs + ^pjs + h'^pfs\‘^ *x Pt)dptdt < I{p) (90) 

J-g2d \ Z ' 0 Jo JR2ti \ Z / 

Now we study the <5 —)■ 0 limit of (90). Using a similar analysis as before, the following convergence 
results hold in Lio(.(IR x M^'^) as d —)■ 0, 

fs —t /, dtfs —t dtf, Vfs —> V/, Ap/i —> Ap/. 

Since fx = (p & C'“(M^‘^) (see Theorem B.2) and therefore fs,T — t /t everywhere, we have 


To pass to the limit in the right hand side of inequality (89), we use the estimate (80) with the choice 
<i) = — yJVid • Vip *x Pt (see (85) for the definition of 4'), which leads to 

limsup / / (dtfs + .^ptfs+ \\'^pfs\^*x Pi)dptdt 

(5—s-O Jo JR2d \ Z J 

<limsup/ / (lys *x'i’- i^s *x *x Pt)■y^p *x Ptjdptdt 

S-tO Jo JR2‘i ^ ^ 

+ [ [ (lHd‘^H\\L<^{vs*x\'^f\+lvs*x\'^pf\)+j J'^'ip*xPf'^fs-vs*xiJ'^'ip*xPf'^f))dptdt 

Jo Jr?<^ ^ 


to JR2‘i 


'^dptdt. 
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The only term left is the single-integral term at t = 0. Instead of passing to the limit, here we estimate 
as follows 

[ Po( f6,0 + 7 ;'^ *x Po) < J^ipo)+^ 0 g [ -log Zh- (92) 

J^ld \ Z / Jp,2d 

Let us first prove (92). Recall from the proof of Theorem B.2 that 

/o = 2 log 5 o < 2 log a 2 + 2 / 32 ^ y/uj 2 + H, 

where a2,l32,0J2 are constants, and therefore 

Ao = ^^5 /o < 2log 02 + P 2 t( 6 '^\\D'^./W+H\\l^ + 2 Vw 2 + i?). (93) 

To arrive at the estimate above we have used 

vs *x f{x) = J f{x- y)vs{y)dy < J (\f{x)\ + |V/(x)|y -h ^\y\^\\d^ f\\L°-)vs{y)dy < \f{x)\ + ^S^Wd^fh--, 


for any / G and vs satisfying J vs = 1 and / xvs{x)dx = 0 . 

Furthermore, using the growth conditions on H = /2m -\- V{q) (see (VI)) we find for the second 

derivative 


d^s/uj2+ H 


VH ® VH d^H 
4(w + iL)3/2 + 2Vw2 + H 


d‘^\/ijj 2 + H 


< oo, 

L°° 


and therefore (93) implies that |/ 5 ,o| < C{1 + The estimate (92) then follows by using a truncated 

version of fsp in the variational definition (75) of the free energy. 

Substituting (92) into (90) we have 


ih + \^ 


*x Pn 


tdt 


-[ I {dtfs+^pfs + ^\^pfs\^+lJ^H-V^^^pAdpt 

='^ Jo JR2‘i Z J 

< I{p) + T{po)+\og [ - logZ//. (94) 

jR^'i 

Using the bound |/yol ^ C(1 4- and the dominated convergence theorem we find 


log 


Js.o-H 


I R2d 


s log 


Jo-H 


/fl2d 


and therefore passing to the limit (5 —)■ 0 in (94) gives 


[ Pr(fr + 1 ;^ *x Pr) - f [ \ dt f + ^pf + +-f JV H ■ Vip Pt] dptdt 

jR^'i \ Z ^ Jo JK2<i ^ J 

< I{p)+ T{po) + log [ ef°~^dx - log Zh- 

jR2ti 


□ 


We are now ready to prove Theorem 2.3. 

Proof of Theorem 2.3. Combining (83) with equation (76a) we have 



*x Pt 


< I{p) + T{po) + log 


e* ^dx — log Zh 

- 7 ^/ [ (Apip-VpH-Vp(p-hVpip\‘^)dptdt. 

Jo JR2‘i ^ Z / 
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Substituting this relation into the formula (75) for the free energy, and using f\t=T = ‘P, we find 
T{pr)= sup [ ((()+ Pr)pr - log / e'*’~ ^ dx + \og Z H 

0GC“(R2<i) ^ ^ ' jR2d 

< sup /(p) + J'(PoIm) + log / e^°~^dx — log / e'^~^dx 

0eC“(R2‘i) JM^d J^2d 

- 7 ^/ [ (Ap(p-VpH-Vpp-l-\Vpp\‘^)dptdt. 

Jo jR2<i ^ i / 

Rearranging and using (77) this becomes 

J^iPr)+l^[ f (App-VpH ■Vpp-hVpp\‘^)dptdt<I{p)+T{po\p). (95) 

Jo Jr^‘‘ ^ z / 

Taking the supremum over p € (^“(M x and using a standard argument, based on C^-seperability 
of (7“, we can move the supremum inside of the time integral and the definition of the relative Fisher 
Information (24) then gives 

J'iPT) + Yj IiPt\p)dt < J^ipo) + I{p). 

This completes the proof. □ 


C Properties of the auxiliary PDE 


In this appendix we will study the following equation in [0,T] x 

dtg - JVH • Vp - JV('i/' * pt) ■ Vp + VpH ■ Vpp - App - | (JVi7 • VV” * Pt - 4') = C7, 
g\t^o =g°- 


(96) 


In addition to providing well-posednes results (see Section C.I), in this section we also prove certain important 
properties of this equations such as a comparison principle and bounds at infinity (see Section C.2). 

Equation (78) is a special case of (96) with the choice 

{7 = 0, 4'=-(^App - Vpp • Vpi7 - ^iVpppy 

Here and in the rest of this appendix we set 7 = 1 , since the value of 7 plays no role in the discussion. 

The results of this appendix are a generalization of [Deg 86 , Appendix A]. In that reference Degond treats 
the case of equation (96) without on-site and interaction potentials and without the friction term Vpi7 • Vpp. 
We generalize the equation, while closely following his line of argument, and proving what are essentially 
similar results. 

The main difference in our treatment is the introduction of a weighted functional setting for the equa¬ 
tion (96), in which the L^-spaces, Sobolev spaces, and the weak formulation of the equation are all given 
a weight function e~^. The choice of this weight function is closely connected to the fact that e~^ is a 
stationary measure both for the convective part of the equation JWH ■ Vp and for the Ornstein-Uhlenbeck 
dissipative part Vpi7 • Vpp — App. This weighted setting has the advantage of effectively eliminating all the 
unbounded coefficients in the equation. 


C.I Well-posedness 

Following Degond [Deg 86 ] we introduce a change of variable 

p^e^‘p, with A> +1, (97) 
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which transforms (96) into 


dtg - JVH ■ Vg - * pt) ■ Vg + ■ Vpg -\9-\ * pt) + (a + 

9\t=o =9°- 

In what follows we will study the well-posedness of (98), and at the end of the section we will extrapolate 
the results to (96). 

Let us formally derive the weak formulation for (98). Multiplying with a test function (p G (^“([Ojr) x 
and a weight e~^, and using integration by parts, for the left-hand side of (98) we get 

p{dtg - JVH ■ Vg - JV(V' * Pt) ■ Vg + VpH ■ - \9 - ^ * pt) + (a + U>')gy-^ 

y[-dt(j) + JS/H ■ \/4>+ * pf'^4> + y+ * Pf'^g + '^pg ■ Vp(/)| e~^ 


/7 

/o 


/7 

/O 7]R2d 


/]R2d 


g<t> 


r.-H 




The weight e ^ causes cancellation of certain terms after integration by parts, as for instance for the two 
convolution terms, 

/ / (pl-JVtp* pfVg- -gJVH - Vip* pi\e~^ 

do dK2<i ^ z / 

p'^ /* f 1 1 

= / / Pf^9-Pf^g-7,9J^H-Vip* Pi\e~^ 

Jo dR2<i ^2! z z / 

= [ f (-l:4>J'^tp*Pf'^g+yj'^tp*Pf'^(t>+7;4>9J'^H-S/ip*pt-l:(j)gJ\/H-\/tl;*pt)( 

Jo dR2d \ Z Z Z Z J 

p'^ /* 1 1 

= / / Pf'^g+Pf'^<p)e~^. 

Jo dR2d \ z z / 

These calculations suggest that we seek weak solutions in the space 


.-H 


X := {g G L2(0, T; ^ y. p 

endowed with the norm 


2 /Ta2d, g—if 


(99) 


WgWx ■— ll5llL2(L2(e-ff)) -t l|Vp5||^2(j;,2(g-H)). 

The subscript in the norm is shorthand notation for L^(0, T; e“^)). Note that C'“((0,r) x M^*^) is 

dense in X. 

We will use || • ||l 2 to indicate the norm without any weight, and (•, •)v:',x for the dual bracket between 
X' (the dual of X) and X. 

For sl\ g G X we can consider the combination dtg — JXH • Vg as a linear form on (^“((O, T) x M^'^) by 
interpreting the derivatives in the sense of distributions: 

{dtg-JXH ■Xg,(j)) :=- [ [ gidtp-JXH ■X(j))e-^ iov p G {(Q,T) 

Jo 

Note that the weight function e~^ yields no extra terms upon partial integration If this linear form is 
bounded in the df'-norm, i.e. if the norm 

119*3-JVid-V 5 ||x' := sup i T / g{dt<P - JXH ■ VcP)e-^ : P G {(0,T) mx<l 

Jo dR2d 
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is finite, then dtg — J\/H ■ Vg € X'. We define Y to be the space of such functions g: 

Y := i^g e X : dtg - JVH ■ Vg e X'j, with norm ||g||y := ||g||^ + \\dtg - JVH ■ Vg||^,. (100) 

We now define the variational equation (which is a weak form of (98)) to be 

Ex{g,<l>) = Lx{4>), v</) e cr([0,T) X ( 101 ) 

where Ex : X x ^“([O, T) x ^ r and Lx : ^“([0, T) x ^ ^ are given by 

Ex{g,<l^)--= +JVil-V(/)+-JVV’*Pf V<^+ 

- ^(pJXtp * PfXg + Xpg ■ Vp(/)| e~^, (102) 

Lxid) := {e-^*U,d)x',x + [ ( 103 ) 

Jm.2d 

We use the subscript A to indicate that that the variational equation (101) corresponds to the transformed 
equation (98). 

We now state our main result. 

Theorem C.l (Well-posedness). Assume that 

«'GCf(M2rf), U(^x\ and g° e 

Then there exists a unique solution g in Y to the variational equation (101). Furthermore the solution g 
satisfies the initial condition in the sense of traces in 

To prove Theorem C.l, we require certain properties of Y. In the first lemma below, we prove an auxiliary 
result concerning the commutator of a mollification with a multiplication. In the second lemma we prove 
that C'“([0,r] X is dense in Y. In order to give meaning to the initial conditions (as required in 

Theorem C.l) we need to prove a trace theorem. We prove this trace theorem and a Green formula (which 
gives meaning to ‘integration by parts’) in the third lemma. At the end of this section we prove Theorem C.l. 

Lemma C.2. Define ixs{x) := for some v G (^“(IR”), and consider f G IT^’'^(IR”; i?"), h G 

ll/i,r(]gn) tvhere 1 < g,r < oo and 1 < p < oo satisfies ^ ^ + y- Then for any S > 0 we have 

^if.Xh)-f-ixs* Xhh. < (|| V/ll^, ( [ 1^1 \Vi^{z)\dzy + Ml, II div/lli,) (104) 

Proof. The argument of the norm on the left hand side of (104) is 

{vs * (/ • V/i) - f -vs* V/i) {x) = [ vs{x - y) [f{x) - f{y)] Vh{y)dy 

= J (yvs{x - y) [f{x) - f{y)] + vs{x - y) div f{y)^h{y)dy =: I + II. 

Using Young’s and Holder’s inequalities on the second term gives 

IIIIIIlp = \\vs * {hdivf)\\LP < ll^allLill/idiv/lliP < ||j^5||Li||/i||Lr|| div/||z,,. 
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For the first term we calculate, writing ks{z) ■= | 2 ||Vi^ 5 ( 0 )| and k := || k 5 || li , that 




/ Ks{x - y) 
jR" 


-Vvs{x - y) [f{x) - f{y)] h{y)dy 
k Jr" 


< - 
“ k. 


< 


a'i/Pp 
k q 


/ Ks{x - y) 
jR" 


X - y\P 

l/(a;) - f{yT 


\f{x) - fiy)\ 
k - y\ 

IHvWdy 

1 p 


\hiy)\dy 


and therefore 


Hi. = 


Wiys{x- y) [f{x) - fijj)] h{y)dy 

\fix) - fiy)\‘‘ 


<a'^^Pk^ [ [ Ks{x — y) 

Q JR"JR" 

<a‘>/Pk^-^Pk\\^m + 4 u-kP-^^k\ 


'-y\q 


-y\q 

dx 
dydx 4- 


dy 


ka^/p 




a 


•/p 


q 


By optimizing over a we find 


|i|li. <fc^l!v/||i,||h||i. = ||/c,||i,||v/||i,||h||i.. 


Ksix - y)\h{y)\''dy, 


Ks{x - y)\h{y)\^dydx 


Combining these estimates and using 

[ \z\\Viysiz)\dz = 6~'^ [ ^-^\Viy\(j')dz= [ \z\\Viy{z)\dz 
we obtain the claimed result. 


(105) 

□ 


Lemma C.3. Let Y be the spaee defined in (100). Then (^“([OjT] x M^'^) is dense in Y. 

Proof. We prove this lemma in two steps. In the first step we approximate functions in Y by spatially 
compactly supported functions. In the second step we approximate functions in Y with spatially compact 
support by smooth functions. 

In both steps we construct an approximating sequence that converges strongly in X and weakly in X'; 
it then follows from Mazur’s lemma that a convex combination of this sequence converges strongly in both 
X and X', and therefore in Y. 

Step 1. For an arbitrary g €Y, define gR{t,x) := g(t,x)Xii{^/W(x)), where Xr € (^“(M;®) is given by 


/X 11, kl < .R . , 1,,-^ II C 

Xr{x) = < , , , with VXr koo < —. 

^ ^ 10, |x| >2R " " “ R 


(106) 


Note that gji is compactly supported in Using the dominated convergence theorem we find 


Il5fl-5lll = 


/o JR^^ 


(l-Xfl)2(g2 + |Vpg|2)+52|VpXH|' 


e-^ 0. 


Here we have used |Vi7p < C(1 + H) and the estimate 


iXpXnl^ = {x'RiVH)r — \Xj,H\^ < C. 


To conclude the Hrst part of this proof we need to show that 


{dtgR — JS7H ■ S7gR, (j))x',x 


R—^oo 


>{dtg-JXH-Xg,<l>)x',x, Vfi G X. 


(107) 
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Let (j) G C'“((0,T) X Then 

\{dtgR-J^H-VgR,4>)x',x\= f f gR{dt<j)-J^H 

Jo 



< ||a*g-JVi/-V5||x'||<^llx, 

where we have used JVH ■ V{Xr o '/H) = 0 to arrive at the final inequality. As a result 

WdtgR - JVH ■ Vgflllx' < \\dtg - JVH ■ Vg\\x', (108) 

and using the dominated convergence theorem we find 

{dtgR - JVH ■ VgR,(l))x',x {dtg - JVH ■ Vg,4>)x’,x, V(/) G C“((0,r) x ( 109 ) 

Estimate (108) together with the convergence statement (109) implies that (107) holds. As mentioned above, 
Mazur’s lemma then gives the existence of a sequence that converges strongly in Y. 

Step 2. In this step we approximate spatially compactly supported functions g G E by smooth functions. 
Using a partition of unity (in time), it is sufficient to consider 

A'.= {g &Y : g has compact support in [0,T) x 

We will show that these functions can be approximated by functions in C^{[0,T) x 

For any g G A, we define its translation to the left in time over r > 0 as gr{t,x) := g{t + t,x). 
Furthermore define gr,s = i^s * gr, where rs is a symmetric regularising sequence in M X Note that 

gT,s G C“([0,r) X when S is small enough. Using standard results it follows that gr ,5 —)■ g as r, 5 —>■ 0 
in X. We will now show that 


\{dtgr,s - JyH ■ Vgr,s, < C'||5lUII</>||x + Hf - JVH ■ V5IU'll-^IU, (HO) 

where C is independent of r and 5 and of the test function (j). For any (f) G C)?°((0,T) x 

{dtgr,5 - JVH ■ Vgr,s, 4>)x',x = - f [ i^^s* gr){dt(l> - JVH ■ V<P)e-^ 

Jo Jm^-^ 

= -( [ gr\Rs*{dt^e-^) + Rs*{JVe-^-V^)] ( 111 ) 

Jo Jm^-^ l j 

= -[ [ gridt^Rs * (p) - JVH ■ {rs *V(j))]e~^ 

Jo 

-[ [ gr VS * {dt4>e~^) - {rs * dt4>)e~^ -[ [ gr Rs * {JVe~^-Vcp) - JVe~^ ■ {rs *V(p) . 

(112) 

We now estimate each term in the right hand side of (112). For the first term, extending the time integral 
to M and using a change of variables we find 

I I g{t + T,x)(dt{Rs * (p) — JVH ■ {rs *V<j))){t,x)e~^^^^dxdt 
Jm ./R2d V / 

— [ [ gis,x)(dt{Rs * (p) — JVH ■ {rs *V<p)){s — T,x)e~^^^^dxds 

Jr dR^ti J 

— [ [ g{s,x)(dt{gRs * (p) — JVH ■ {rs *V(p)g){s — T,x)e~^^^^dxds 

Jr JR^ti J 

< \\dtg-JVH-Vg\\x'\\M--T,-)v\\x. 
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Here ry € C'c([0,T)) is any smooth function satisfying 0 < ry < 1 and ry(t) = 1 for t G supp^ and the hnal 
inequality follows by the definition of Y and (/>s(- — r, •)ry G C^((0,T) x Using ry < 1 and a change of 

variable we obtain 


Usi--T,-) 


li< 


\(l>s{t - r,a;)p + \Vp(l>s{t - t,x)\" 




II, 


and therefore for the for hrst term on the right hand side of ( 112 ) we have 



— T. 


x) (^dtiixs *<(') — JV-H • {I's * {x, t)e ^^^^dxdt 


< 119 * 5 - 


For the final term in the right hand side of (112), using div( JVe ^) = 0 and applying Lemma C.2 with 
/ = JVe“'^, h = <j> and r = p = 2, q = oo, we find 


fl -( 


i^s * (JVe • V(j)) — JVe • {us * Vq 
< \\9t\\l^S) \\vs * {JVe~^ ■ V(j)) - JVe~^ ■ {vs * V(/))||^2(g^ 


Here S := supp 5 , ^ is the Hessian of e ^ and a := inf 2,55 e > 0. Repeating a similar calculation 

for the second term on the right hand side of ( 112 ), we find 



vs * {dt4>e )- {vs* dt4>)e 


-H 


< C'll5llx||(/>||x- 


Combining all the terms we find (110). As a result, \\dtgT,s — H ■ '^gT, 5 \\x' is bounded independently of t 
and 5. Using the dominated convergence theorem we also have for all (j) G C(?°((0,T) x M^*^) 


Vr > 0 : {dtgr ,5 - JVH ■ Wgr,s, 4>)x',x ^ {dtgr - J^H ■ V 5 ,, <l>)x',x, and 
{dtgr - JVH ■ Vgr, 4>)x',x ^ {dtg - JVH ■ V 5 , 4>)x',x 


Taking two sequences r„ —>■ 0 and (5„ —)■ 0 such that the translation and convolution operations above 
are allowed, we use the boundedness of dtgr^s — J'^H ■ '\/gr,s in the separable space X' to extract a subse¬ 
quence that converges in the weak-star topology; we then use the density of C^{{0,T) x M^*^) in X and the 
convergence of gr,s to identify the limit. Again using Mazur’s lemma it follows that there exists a strongly 
converging sequence. This concludes the proof of the lemma. 

□ 


Lemma C.4. Let g €Y. Then g admits (eontinuous) time traee values in L^{e ^). Furthermore, for any 
g,g &Y we have 


{dtg - JVH ■ V 5 , g)x',x + {dtg - ■ V 5 , g)x',x = 


99 e-^ 


t^T 

i =0 


(113) 


Proof. We will prove that the mapping 

Cr([ 0 ,T] X 3 g ^ (5(0),5(T)) G L"(e-") x L^e-^), 


can be continuously extended to Y. This implies that any f G Y admits trace values in Lf{e~^) since 
(^((“([O, T\ X is dense in Y by Lemma C.3. The proof of (113) follows by applying integration by parts 
to smooth functions and then passing to the limit in Y. 
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Consider 77 G C°°([0,T]) with 0 < r? < 1, r]{t) = 1 for t G [0,T/3], and ri{t) = 0 for t G [2T/3,T]. We 
have for any g G C'“([0,T] x 


ll5lt=o|li 


0|lL2(g-H) 




2 _ 2 | 


/O 


ggdt{gg)e~ 


= -2 f [ gvidtigv) - JVH ■ V{gg))e-^ -2 f [ gg JVH ■ V{gg)e-^ 

Jo JK^ti Jo JR^t* 

= 2{{dt-JS/H-S/){gg),gg)x',x+[ [ JVe"" • V(5V) 

Jo JR^-i 

= 2((at - JWH ■ V)(ot),< 2|| {dt - JVH ■ W){gg)\\x'\\gv\\x, (114) 

where the final equality follows by the anti-symmetry of J. Note that HgTyllx < llsllx- Furthermore 

\\{dt- JS/H ■S/){gg)\\x' = sup f ( gg{dt<j) - JS/H ■ S/(j))e~^ 


0eC”((O,T)xR^‘*) Jo 
Il0llx = l 


= sup / [ g{dt{(j)g) - J\/HS/(j)g)e ^ ( [ g(t)dtge ^ 

<t> Jo JR 2 -i Jo JR 2 <* 

< \\dtg - JVH ■ Vg\\x' + WdtvWooMx < CMy- 
Substituting back into (114) we find 

ll5|t=ollL2(g-H) < CllgllY, 

which completes the proof for the initial time. The proof for the final time proceeds similarly. □ 

Now we are ready to prove Theorem C.l. We will make use of a result of Lions [Lio61], which we state 
here for convenience. 

Theorem C.5. Let F be a Hilbert space, equipped with a norm || • ||i? and an inner product (•,•)• 0 be 

a subspace of F, provided with a prehilbertian norm || • ||e, such that the injection Q ^ F is continuous. 
Consider a bilinear form E: 

E : F X Q 3 {g,(f>) E{g, ^) G M 

such that E{-, f) is continuous on F for any fixed </> G 0, and such that 

\E{4’t 4')\^ ^\\4>\\'e^ V(/() G 0, with a > 0. (115) 

Then, given a continuous linear form L on 0, there exists a solution g in F of the problem 

E{g,(l>) = V())G0. 

Proof of Theorem C.l. We will use Theorem C.5 to show the existence of a solution to the variational 
equation (101). We choose F’ = X and 0 = C'“([0,r) x .j^ith 

ll^^lle = W^Wx + 2 ^ 

By definition 0 ^ X. 

The bilinear form Ex defined in (101) satisfies property (115), since 

Exif, f) = £ * Pi ■ + (^ + * Pi ■ 

> 2 ll'^lt=olli 2 (g-.) +min{l,A- ^||d/|U=o}||<).||^ > ||<).|||, 
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where we have used (97). 

Since all the conditions of Theorem C.5 are satisfied, the variational equation (101) admits a solution g 
in X. We have 




g{dt(t) - JVH ■ 



-(jyJX'tp * pt 


Vff + Xpg ■ Xp<p}e-^ + Lx{^) < CMxUWx, 


where we have used JVi/' * Pt ■ ^4> = —'^q'4’ * Pt ■ ^p4>- Note that C > 0 is independent of (j), and therefore 
the solution g belongs to Y. 

Next we show that g^ appearing in the definition of L\ in (103) is the initial value for the solution g 
of (101). Choose (j>(t,x) = where (j) G C'“(M^‘^) and the sequence satisfies ^e(O) = 1, —>■ 0 

for any t G (0,T) and —)■ —i5o (Dirac delta at t = 0). Substituting (j) in (101) we find 




g(j){x)(t)'^{t)e = 9 (t)(.x)e +o(l) 


(116) 


as £ —>■ 0. By Lemma C.4, g admits trace values in L^(IR^‘^;e ^), and therefore passing £ —)■ 0 in (116) we 
find 



g^{x) (j>{x)e ^dx = 0 , 




Finally we prove the uniqueness in Y of the solution of (101). Consider two solutions 51,52 G F and let 
5 = 5 i — 52 . Since the initial data and the right-hand side U in (101) vanish, we have E\(g, (/>) = 0 for all 
(j) G C'“([0,T) X K^*^). Taking a sequence (j>n G C“([0,r) x K^'^) that converges in X to 5 , we find 

0 = lim Ex{g,(t)n) 


= lim {dtg - JXH ■ V 5 , (j)n)x',xY 


^ J ■'^5 + Vp5 • Vp(()„|e ^ 

= {dtg-JXH-Xg,g)x',x+ Cl { U + 9^ + \^p9\^}e-^ ^ f 5"lt=T e"''+ II 5 III > 0. 

Jo JR2<i >. V ^ / J I J^id 


This proves uniqueness. 


□ 


Remark C.6. Using the same technique as in the uniqueness proof above we can prove the following result. 
If 5 G F satisfies Ex{g,(j)) = L\{(j)) for all (j) G C^{[0,T) x K^"^), then for all (j) G C'([0,T] x K^'^) we have 

E\{9,(t)) = Lx{(j)) - [ 94>\ e~^ = {e~^*U,4>)x\x - j 94>\\~^ e~^. 

jRSrf JK2d 

Theorem C.l proves the well-posedness of the variational equation (101) which is a weak form for the 
time-rescaled equation (98). Transforming back, we also conclude the well-posedness of the variational 
equation corresponding to the original equation (96). We state this in the following corollary. 

Corollary C.7. Assume that 

4- G (^^(M X u e X', and 5 ° G e-^). 


Then there exists a unique solution 9 to the variational equation 

E{g,cl,)=Li<l>), V</)GC“([0,T) xM^J), (II7) 
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in the class of functions Y. Here E : X x x —)• M and F : Cf°{[0,T) x —)• M are given by 

['T-' p 1X1 

E{9,(f) ■■= / + JVH ■ V(j)+ * Pt • Vg + Xpg ■ Vp(/)| e~^, 

(118) 

m := ([/, ct>)x',x + [ e-". (119) 

C.2 Bounds and Regularity Properties 

Having discussed the well-posedness of equation (96), in this section we derive some properties of its solution. 
These properties play an important role in the proof of Theorem B.2. 


C.2.1 Comparison principle and growth at infinity 

We first provide an auxiliary lemma which we require to prove the comparison principle. 
Lemma C.8. For g G Y, define g~ G X by g~ := max{—g, 0}. Then 


{dtg - JVH ■ Vg,g-)xGX =-\ j (ff") 




^-H 




( 120 ) 


Proof. Since (^“([O, T] x M^'^) is dense in Y by Lemma C.3, it is sufficient to prove (120) for g G (^“([O, T] x 
For g G C“([0,r] x g~ G X n Lip(IR^‘^) and there exists a sequence € (^“([OjT] x M^'^) such 
that (/)n —t in X. We have 


{dtg - JVH ■Vg,g )x',x= Mva {dtg - JVH ■Vg,(j)n)x',x 


lim [ [ 4>n{dtg - JVH ■ Vg)e 

n-foo Jq J^2d 

f [ 9-{dtg-JVH-Vg)e-^ 

Jo 

-[ [ g-{dtg--JVH-Vg-)e 

Jo JR^'* 


-\„-H ((13) _1 
2 


{g- 


t=T 




t=0 


□ 


We now prove the comparison principle. 

Proposition C.9 (Comparison principle). Let g be the solution given by Corollary C.7. Then 

1. 5 ° > 0 and U >0 > 0. 

2. g° G L°°{R‘^‘^) and U G L^{0,T-, L°°{R‘^‘^)) => g G L°°{[0,T] x with 

\\9{t)\\L^<mL^+ f\\U{s)h^ds. 

Jo 

Proof. Let g be the solution of the transformed variational equation (101) provided by Theorem C.l, which 
reads explicitly 

0 = {dtg - JVH ■ Vg, f>)x',x - d)x'.x - f 5 V 0 

jR2d 

/*(7^ /* 1 11 

+ J y • V(/)+(^A+--(/)JVV'*Pt • Vg + Vpff-Vp(/)|e"^. 
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Consider a sequence (j)n ^ 9 in X as n —>■ oo, with (f>n > 0. Then by the assumptions on U and ( 7 ° we have 


and therefore 


(e + [ 9 °(t)n\t=o(^ ^ > 0, 


0 < lim {dtg - ■'^g,(j)n)x',x 

n—>-oo 


+ / f * Pt ■ 

Jo JK^ti '■ 

= {dtg- JVH ■Vg,g-)x',x 


+ / f *Pf'^g 



- (^A + * Pt • V5 + Vp5 • Vp(/)„| 

- (a+ - ^ 9 ~ ■ V5 +Vp5 • Vp5"| 

{(^+ 2 ^) 1 ^ P + I^p5 P}e 


e 




e 


-H 


where the last equality follows by Lemma C. 8 . Since g~\t=o = 0 and A > |||'I'||oo + 1 by assumption (97), 
this implies that g~ = 0. 

This completes the proof of the first part of Proposition C.9. The second part is a simple consequence 
of the first part, by applying the first part to the function g G Y, g{t) := ||g°||oo + /q ||C(s)||loo ds — g{t), 
which satisfies an equation of the same form. □ 


In the next result we use the comparison principle to prove explicit bound on the solution of equation (96) 
when U = 0. 

Proposition C.IO (Growth). Assume that inf H = 0 and 0 < oi < ( 7 ° < 02 < 00 . The the solution for the 
variational problem (117) with U = 0 satisfies 

ai exp (^-fdity/uji + < 5 < 02 exp {j 32 t\/ 0 J 2 + 

for some fixed constants /3i,/32, Wi, W 2 > 0. 

Proof. We first prove the second inequality in Proposition C.IO. For some constants /32 > 0 ,W 2 > 1 to be 
specified later, we define (72 := 02 exp(/ 32 f -\/<^2 + H) G Y, such that ( 72 |t=o = 0 : 2 . We will show that ( 72—5 
satisfies the assumptions of Proposition C.9. 

Substituting g 2 — g (118) and using the smoothness of 92 we find 


E{g 2 - g,(j)) = {U2,4 ')x',x 


(52|t=o-5°)</>e ^ 


with 

U 2 = dtg2 - JVH ■Vg2 - * pt) ■ V 52 + VpiL • Vp52 - ^92 - y (JViL • Vf/' * Pi - ^') • 

By construction g 2 \t=o — 5 ° > 0. We now show that U 2 > 0. We calculate 

dtg2 - ■ Vp 2 - JV{-if * pt) ■ Vp 2 + VpiL • VpP 2 - ApP 2 - y (JViL • Vf/' * Pt - ^') 

> + H - i (JViL • VV’ * Pt - ^') + ^/32 Vw2 + H - cl32t - cjdlt^^, 

where the constants c, c are independent of ^2 and 022 , using the uniform bounds on AiJ and the bound 
|Vi7p < (7(1 + H). Because of this growth condition on Vi7, we can choose / 32 , W 2 large enough such that 

]^I32^/iw2 + H > ^ (JViL • Vt/; * Pt - ^'). 
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Then we choose UJ 2 even larger such that for any t € [0, T] 


i/32\/w2 + H > ^/32y/uj2 > cl32t + . 

For these values of /? 2 , W 2 we therefore have 


(7o 

U 2 = 9(52 - JViJ • V 52 - J\/ {ip* pt) ■ V 52 + VpiJ • Vp 52 - Ap 52 - — {JS/H • VV" * P( - 4') > 0. 


Using the comparison principle of Lemma C.9 we then obtain 

g < a 2 exp (^( 32 t \/^2 + ■ 

Proceeding similarly it also follows that 51 := ai exp(—+ H) is a subsolution for (96) for appro¬ 
priately chosen /3i and wi, and the first inequality in Proposition (C.IO) follows. □ 

In the next result we make a specific choice for T (which corresponds to the Fisher Information for the 
VFP equation) and show that with this choice, the L^{e~^) norm of the solution of (96) decreases in time. 

Proposition C.ll. The solution g for the variational problem (117) (in the sense of Corollary C.7) with 
U = 0 and 

^'=VpV?-VpiL-(121) 

for some cp € C(C{[0,T] x satisfies 



< 0 . 


Proof. Let g G Y he the solution given by Corollary C.7. Since g G X, there exists a sequence G 
Cf°{{0,T) X M^'^) such that (()„—> 5 in X. Furthermore dtg — JVH ■ V 5 G X' and we have 


( 9(5 - JVH ■ Vg,g)x',x = bm {dtg - JXH ■ Vg , (pn) x',x ■ 

n—^oo 


Using the same approximation arguments as in the proof of the comparison principle we find 


f-^2d 


9te ^ 


t^T 


= {dtg-JVH-Vg,g)x',x 

fT f t X 

= / / ( x'/>n9V7/> * P( • V5 --5JVV’* P( • v^„ - Vp5Vp(()„ --5(()„^')( 

n-taojg jR2dVz 2 2 / 

= f ! (-|Vp5p-U'I^)' 

^0 \ Z / 


.-H 


^-H 


Using Lemma C.4 and substituting (121) into this relation we find 


']R 2 d 




pT /* 1 1 

/ / Ap(/3 — Vp(/? • VpiJ — — I Vp(/3| f 

Jo JK^ti ^ 2: L I iJ 


-H 


= - f f Vp5-k L2|Vp(/?p)e ^ < 0. 

Jo JR 2 rf ^ 4 / 

where the second equality follows by applying integration by parts to the Apip term. This completes the 
proof. □ 
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C.2.2 Regularity 


In this section we prove certain regularity properties for the solution of equation (96). We first present a 
general result regarding regularity of kinetic equations. This result is a combination of Theorem 1.5 and 
Theorem 1.6 [Bou02]. The main difference is that we assume more control on the second derivative with 
respect to momentum, which also gives us a stronger regularity in the position variable. 

Proposition C.12. Assume that 

dtf +p-Vgf - aApf = g m M x (122) 


holds with cr > 0 and 


/,g e X Vp/, Vpff GL'^iRx 

Then Ap/, V^/ € x dtf G x M^'^) and 

\Ngf\\L^<c{\\Vpgh^ + \\fh^). 

Proof. From [Bou02, Theorem 1.5] it follows that Apf G L^(IR x M^"^) with 

o’II^p/I|l2 < Cd\\g\\L'^, 

for a constant that only depends on the dimension d. This implies that the Hessian in the p-variable 
satisfies D^f G T^(IR x M^*^) as well. 

To prove the Proposition, we first assume that f,g G (^“([OjT] x M^'^). We will later extend the results 
to the low-regularity situation via regularization arguments. 

Writing (/, g) = /j{XR 2 d fg and using integration by parts we have 

W^qJWh = {dqj,dgj) = {dgJ,dp^{dt+pVg)f - idt+pVg)dpj) 

= (dg, f, dp^ {dt + pV,)/) -k {dg^ {dt + pVg)f, dp^ /) = 2 (9,^. dp^ /, a Apf) + 2 {dg^ /, dp^g) 

<0 + 2||a,^/|U2||ap^g||i2 (123) 

Here we have used the (hypoelliptic) relation dg. = dp.{dt +pVq) — {dt + p'^g)dp. to arrive at the second 
equality. The final inequality follows since / is real-valued, which implies that j/p is an even function and 
therefore 

{dg^dyJs,R,Apf) = [ = 0 , 

where (',77 are the Fourier variables corresponding to q,p. 

Inequality (123) gives 


\\dgJ\\L^<2\\dp^gU2. (124) 

Since Vg/, Apf, g G L^(IR x using (122) we have dtf G T^oc(® ^ This proves the result for smooth 
and compactly supported / and g. 

Let us now consider general /, g G L^(IR x IR^'^) as in the Proposition, and define fs := vs * f and 
gd := vg * g, where vs is a regularizing sequence in IR x IR^'^. Then we have 

dtfs + p ■ Vg/i - Apfs = gs + gs, 

where gs = P ■ ^gfs - vs* {pVqf). Next we define fs,R := fsXn and gs,R := gsXn, where 

Xr{x) = Xi , where Xi G C“(IR^‘^), Xi(x) = 1 for jccj < 1, Xi(x) = 0 for jx] > 2. 
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Then we have 


dtfs,R + p ■ - ^pfs,R = {gs + gs) ^r + gs,R =■ gs,R, 

where 


gs,R = fsp ■ '^q^R — fs^p^R + ^pfs • VpXfl. (125) 

Note that fs,R,gs,R € x To apply (124) we need to show that gs,R,'^pgs,R G T^(IR x M^*^). In 

fact we will show that gs,R,^pgs,R are bounded in L^(IR x M^'^) independently of <5 and R with 

\\^pgs,R.\\L- < Ci\\Wpgh 2 + ll/IU. + IIVp/lUO . (126) 

Combining with estimate (124), we have Vg/ € T^(M x M^*^) with 

| 1 VJ|U 2 = hm IjVJyfllU. ^CdlVpffllL^ + ll/IU^ + llVp/IU.)- 

d —>-00 

Now we prove that gs^R. satisfies inequality (126). Since the equations are defined in a distributional 
sense, for any (j) G (^“(M x M^'^) we have 


/ gS(l>= [-fsp ■ V,(() + fp ■ VgRS *</>]=/ [-fl^S * (P • Vq(()) + fp ■ VgRS * f)] 

7Kl + 2d jRl + 2d Jm + 2d 

< ll/llLd|i^<5 * (p • V,(()) + P • VgRs * (t>\\L^ 

< II/IIl 2|I«<5 IUi|I^IU 2 < c'||/||z, 2 ||(()||l 2. 

where Ks{q,p) = |p||Vgi 25 (( 7 ,p)|. Here the final inequality follows from Lemma C.2 since ||«; 5 ||li < C inde¬ 
pendent of 6 (recall (105)). As a result of this calculation it follows that 

I|5<5 ||l 2 < C||/||i2, 

where C is independent of d. 

A similar calculation for Vpgs gives, using implicit summation over repeated indices, 

/ gsdp:i(l>= [-{vs* f)iPzdg^p.(l>) +fpidg,{Rs*dp-(l))] = [dq,(l)dp.{p,Rs* f) + fPtdp^ivs *dg,(l>)] 

jR^d J^2d J^2d 

= [ [dq^ (j) dp. (p^Rs * f)-Rs* dg^ d dp. {fpi)] 

= / [dq,(t>{piVs*dpJ+ 5ijVs* f)-vs^dq^diPidpJ+ 5ijf)] 

= / dpJ[vs*{Pidqi(j))-PtV5*dq^(j)]<C\\dpJ\\L2\\Vp^\\oc>\\(l)\\L^, 

jR2d 

where C is independent of 5, implying 


\\dpM\<C\\^pf\\L-- 

Now let us consider gs^R (defined in (125)). Since |VpXi^| < \/R and |ApX_R| < 1/i?^, it follows that 

\\gs,R\\L^ < cwfsp■ v,Xfl|U 2 +1^11/,II + ^llVpMI < CII/IU 2 + ^II/II + ^||Vj,/||, 
be. gs,R is bounded in T^(IR x M^"^) independent of <5, R. A similar calculation shows that 

\\dprg 5 ,R\\L-<C{R)^^Q. 

This completes the proof. □ 
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We now use Proposition C.12 to prove regularity properties of equation (96). 

Proposition C.13. Let g be the solution of the variational problem (117) (in the sense of Corollary C.7) 
with [7 = 0 and with initial datum € X. If g^ € C'^(IR^'^) n X, then g satisfies 

dtg,Xg,Apg€Ll^i[0,T]xR^^). 

Proof. Let g be the solution of the variational problem (117) in the sense of Corollary C.7, but on the time 
interval [0,oo); since Corollary C.7 guarantees existence and uniqueness on any finite interval, this g is well 
defined. We extend g to all t by setting 


git) ■= 



t < 0 
t > 0 


We next recast the variational problem (117) in the form used in Proposition C.12. Changing p to —p 
and rearranging (117) we find, also using Remark C.6, for all <j) G C'(?°(M x 




f 

10 




[ [ {g i-dt(l> - p + '^pg 

Jo JwG‘‘ '■ ^ 

• Xp(j) - ^gXqlf * Pt ■ Xp<f) - * Pt ■ Vpgj 

With the choice f = (j>e^, where (j> € Cf°(R x K^"^) we rewrite this as 
^ - P ■ + Vpg • Vp^l 

/ / {Xpg-XpH(j)- gXqV ■Xp^-\gXq'tp* pfXp^-l-gXqf)* pfXpH(j) 

Jo J«?'^ '■ l l 

- \g^<( + * Pt ■ Vpffj 


g<( 


t^T 


^-H 


i=0 


r 

lo 


/]R2d 


gf) 


t=T 


t=0 


■ (127) 


After combining this expression with similar expressions for the regions t > T and t < 0, we find that these 
expressions form the distributional version of the equation 


dtg - P^qg - Apg = G in M x 


i)2d 


(128) 


where 


G = 


-pVqg° - Apg° 

^pg ' ^q^ ^qf^ * Pt ' ^pP ^pP ' ^pdd 


t < 0 

I 5 {Xqlf * Pt ■ XpH + ^') t > 0. 


(129) 


Since g,Xpg G L^(0, T; L^(e“^)) C Lfoc(® ^ l^y assumption € C'^(M^'^), it follows that G G 

LfQ(,(M X After a smooth truncation. Theorem 1.5 of [Bou02] implies that Apg G x M^"^). Using 

this additional regularity in the definition of G (129), it then follows that XpG G L^oc(® ^ Applying 

Proposition C.12 to a truncated version of (128) then implies the result. □ 

Remark C.14. From Proposition C.13 it follows that the solution for the variational problem (117) satisfies 
the original equation (96) (with the choice U = 0) 

dtg — JVH ■ Vg — JV^ip * pt) ■ Xg + XpH ■ Xpg — Apg — ^ {JXH ■ Xtp * p* — 4*) = 0, 

5|t=o = g°, 


in ^ (i-®- derivatives are in 
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D Proof of Theorem 3.1 

In this section we prove Theorem 3.1. We will use the following alternative definition of the rate functional 


T 

\ht\‘^dptdt 

I 0 R2<i 
[+00 

where e > 0 is fixed. 



ifdtpt = e ^ div^pJVH) + App-divpiptht), for/i e 2.^(0, T; L^(p)), 
and p\t=o = Po, 

otherwise, 

(130) 


Proof of Theorem 3.1. We first show that the estimate (49) holds. Since p satisfies I{p) < C, using the 
defintion (130) of the rate functional we find that there exists h € L^(0, T; Ly (p)) such that for any / G 

f fdpt= f (-JVH-Vf + Apf + Vpf-hAdpt. (131) 

dtj^^ Jw2\e y y j 

Formally substituting f = H in (131) and using the growth conditions on H (see (A2)) we find 
dt [ Hdpt= [ {ApH +'\7pH ■ ht) dpt < C + - f \VpH\'^dpt + - [ \ht\'^ dpt 

Jr2 yR2 Z J^2 Z Jg^2 

<C + C [ Hdpt + l f \ht\‘^dpt. 

Jr2 ^ JR2 

The bound f Hpl < C then follows by applying a Gronwall-type estimate, integrating in time over [0,T], 
and using the fact that h G L'^{0,T; Ly{p)). To make the choice f = H admissible in the definition (130) 
of the rate functional we use a two-step approximating argument. We first extend the class of admissible 
functions from Cg(M^) to 

A := G C'^(IR^) : sup (1 -I- |a;|)|A(a:)| < ooj. 

For a given F G A, define the sequence fk{x) = F{x)^k{x) G C'g(M^), where £,k G (^“(M) is a sequence of 
smoothed characteristic functions converging pointwise to one, with 0 < < 1, |V^fe| < 1/fc, and < 

1/fc^. Then |ViF-V/fc|, Apfk, and |Vp/p are bounded uniformly and converge pointwise to the corresponding 
terms with fk replaced by /; convergence follows by the Dominated Convergence Theorem. In the second 
step, we extend A to include F[{q,p) by using an approximating sequence A 9 gk{d,p) = H{q,p)'ifk{H{q,p)) 
where : K —)• K is defined as ipkis) := (1 -I- \s\/k)~"^. Note that 'tpk ^ ^ pointwise as /c —)■ oo. Proceeding 
as described in the formal calculations above we find 

<(7^1-1- J gkdpt + J \ht\'^dpi^, 

where C is independent of k and s. Using a Gronwall-type estimate, integrating in time over [0, T] and 
applying the monotone convergence theorem we find (49). 

Next we prove (50). The main idea of the proof is to consider a modified equation for which an estimate 
of the type (50) holds, and then arrive at (50) by passing to an appropriate limit. 

We consider the following modification of equation (41), 

dtp = — - div(pJVi7) -I- adivp{p'\/pH) + App®, (132) 

where a > 0. Essentially, we have added a friction term to equation (41), as a result of which pA{dqdp) = 
Z~^e~°‘^dqdp is a stationary measure for (132) {Za is the normalization constant). 
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The rate functional corresponding to (132) is 


[i [ dt = ^div{pJVH) + App + divp{p[aVpH-hf]), 

Ia{p) = \‘2j J ‘ for G L2(0,T;T^(pt)), and p|t=o = Po, 

0 E2 

[ +00 otherwise. 

(133) 

Note that equation (132) is a special case of the VFP equation (with the choice 1 ^ = 0) and therefore the 
proof of Theorem 2.3 also applies to this case. We follow the proof up to (95) (adding a constant a to the 
friction) to find for any t G [0, T] 

'H{pr\p°‘)+ J J^(^Apip - aWpH-Wpp - ^\Wp(p\'^"jdptdt < Ia{p)+H{po\p°‘), 

for any p G (^“(M x M^). Using the definition of relative entropy we have 

J^{Pt)+ [ [ (ApP - aVpH-Vpp-l-\Vpp\‘^^dptdt < Ia{p) + T{po) + a [ Hpr-af Hpo. (134) 
Jo dR 2 ^ I / 7r2 

Below we show that /a(p) I(p) as a —>■ 0. Then passing to the limit a —?• 0 in (134) we find 

•^(Pr) + ^ J^^i^App- ^\Wppf^dptdt < I{p)+T{po), 

where we have used |Vpi?p < C{1 + H) along with the estimate (49). The required inequality (50) then 
follows by taking the supremum over p G (^“(M x M^). 

To complete the proof we show that Ia{p) I[p) as a —>■ 0. Using the definition of the rate functionals 
for the original equation (130) and the modified equation (133), we write the rate functional for the modified 
equation as 


■^a(p) = ^ [ [ \ht\^dptdt=^ [ [ \ht - aVpH\'^dptdt 

^ Jo Jm.^ ^ Jo Jk^ 

1 r'^ r 

= d / (\ht\^+a^\^pH\^-^c,^pH-ht)dptdtJJ^Iip), 

^ Jo Jm^ ^ ' 

where we have used |Vpi7p < C{1 + H) and the estimate (49) to arrive at the convergence statement. Note 
that (49) along with the definition of the rate functionals implies that I{p) < oo iff Ia{p) < oo. □ 
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